A problem has happened due to some offending code landing on your main,
production branch. You use
git and your best bet is to keep rolling back
commits until the system finds itself in a steady state. You come late into this
picture and you're unsure how far back you need to go.
Firstly, you ought to be using something that alleviates the need for running through out an entire CI pipeline in order to produce a deploy. I've talked a bit about this in the past on my screencasts about setting up a CI regarding the distinction between a deployment and a release. If you have something like this, rolling back a fair few number of releases is probably trivial to attempt.
However, if you don't have this in place or you really do need to roll through
an entire CI pipeline, then you can still using something like
git bisect to
find the first offending commit.
git bisect runs a binary search across a span of commits. The general
framework for running a
git bisect is the following:
git bisect start FROM_COMMIT TO_COMMIT
- Test the commit, determine if it is good or bad, and tell
git bisect goodor
git bisect bad. You can also skip commits with
git bisect skip.
The trick to finding the first offending commit isn't to run the same steps your
CI pipeline would; you should have all those builds available for review and
they will tell you whether or not a build truly succeeded, unless you can't
trust your CI and, in that case, you have other issues on your hand. Crafting
your own test and running it each time in (2) will help guide you in the
decision to making a choice for whether or not the commit is
light of what you are trying to find.
You can alleviate the tedium of (2) by using
git bisect run and supplying a
program. If the script fails or you ever want to abandon your search midway, you
can always run
git bisect reset and start over again. There are some tricks to
how you can craft the exit codes from the script you write for
git bisect run
that really make this process a lot faster. To give a sense of the range of use
git bisect as a general search tool, let's call our test script
# exiting with 125 tells `git bisect run` to skip this commit.
# exiting with 0 means the commit is `good'.
# exiting with 1 means the commit is `bad'.
cargo build || exit 125 # skip failed builds.
target/debug/program > /tmp/program.out
[ ! diff /tmp/program.out /tmp/program.snapshot ] && exit 1
You'll need to place this script somewhere outside of the current git repository
as it will mess up checkouts between commits, and, as always, ensure it is
executable. Another pitfall that can hurt is how you structure your git history;
if you use merge styled commits, as is the default for GitHub, then you will
probably not care if the commits in between the merge commits fail. You can do
one of two things: output the list of all merge commits that match a particular
pattern, e.g., the way GitHub does it, or you could also, if your history is
clean enough, use
git show --no-patch --format="%P" <commit hash> to determine
if a commit has more than one parent; you'll see more than one hash noted in the
output. You can find quick version I hacked together filtering out commits with
the GitHub styled subject lines you can tweak at this
In the above example I show testing against a snapshot given some program
output, but really the predicate could be anything. Using
git bisect to
drive things like textual search has better alternatives like the "pickaxe" with
git log, but if you want to find the first commit where something
happened and it isn't part of the data that git saves, such as program behavior,
git bisect will let you find it far faster. I've also used this in the
past to whip up quick, minimal tests that I can inject after the checkout and
run some test suite against.
git bisect run takes any binary, too, meaning you
don't have to use a shell script like I have in the example above. The real
aim is not to think of the
predicate script or program as something that has
to be about failures; you can easily use it to discover first instances of any
kind of particular behavior a program may exhibit, as long as it is reproducible
Granted, a system may be so complex in it's operation that there is no way for
you to locally verify the offending commit. Mitigating or "stopping the
bleeding" is something that needs to happen quick. With that said,
might be a better tool for analysis later, when the pressure is low and you can
better craft a test or predicate to find where the fault first occurred, but if
you haven't spent a lot of time with release engineering or you are in a place
where it could use some improvements, running
git bisect in this matter might
help save you precious time, and even if you do have good release engineering in
place, it might help save you a pulling out a lot of hair finding the place
where code has effectively broken down.