You specifically mentioned using rebase, which makes the problem more difficult in the presence of squash "merges". There is an easier way to try it, which may suffice, or may not.
Let's take a look at a workflow that sometimes uses squash "merge"1 instead of actual merges or rebasing, and maybe sometimes uses rebasing.
We start with some chain of commits in some repository, probably a centralized one on some big server we'll call origin
:
...--A--B <-- branch
If it's centralized on the server, we clone it and wind up calling this origin/branch
in our own repository:
...--A--B <-- origin/branch
Now we git checkout branch
and start working, and we make some new commit(s) of our own:
...--A--B <-- origin/branch
\
C--D <-- branch
Maybe ours aren't enough, or we get feedback when we push commit-chain C--D
and make a pull request out of it, so we add another commit or two:
...--A--B <-- origin/branch
\
C--D--E <-- branch
While all this is going on, the repository on origin
is potentially also acquiring new commits, so that by the time we're really ready and have pushed C-D-E
as a pull request, we might even have this on origin
:
...--A--B--F <-- branch
but in any case what happens now is that whoever controls origin
(perhaps directly, as through a clicky GUI interface on GitHub, or perhaps indirectly through their own repository) eventually takes our C-D-E
chain and puts it into the repository on origin
, but does so by making a new, single commit—let's call this CDE
to show that it does what C-D-E
does—and putting that into the sequence on origin
, so that they now have:
...--A--B--CDE--F--G <-- branch
or:
...--A--B--F--CDE--G <-- branch
or similar.
We now git fetch
this to bring our repository up to date, giving us:
...--A--B--F--CDE--G <-- origin/branch
\
C--D--E <-- branch
or similar.
On the other hand, maybe the keeper of origin
keeps our individual commits but rebases them him- or her-self, so that the upstream now has:
...--A--B--F--C'--D'--E'--G <-- branch
and we wind up with:
...--A--B--F--C'--D'--E'--G <-- origin/branch
\
C--D--E <-- branch
1I like to put quotes around "merge" for git merge --squash
since this uses the merge machinery, but does not make a merge commit. Using git revert
, git cherry-pick
, and even git apply
, we often wind up using the merge machinery, but people don't call those "merges"! There seems to be something about the fact that the top level command is spelled git merge --squash
that leads people to call this "merging". Perhaps if the top level command were git gimme
people would call this "gimming"? :-)
Since "squash" is a perfectly good verb of its own, though, I think it would be nice to just call this "squashing", and refer to these as commits that have been "squashed".
The ultimate goal
The goal here is to delete our branch branch
if and only if there's some commit sequence CDE
or C'-D-E'
or some such, in our upstream origin/branch
, that means that our original C-D-E
chain is no longer needed.
The problem is that we don't know what the person or people controlling the upstream have done, because they never told us. (How rude! :-) )
There are any number of things we can try.
Method 1: rebase
We could try just running git rebase
, rebasing our branch
onto our origin/branch
. If—this is a big "if"—they, whoever they are, actually copied our C-D-E
chain to a C'-D'-E'
chain, our Git will probably2 find that the upstream origin/branch
has our three commits, and will therefore drop them from our rebase. If it does, we will get this:
...--A--B--F--C'--D'--E'--G <-- branch, origin/branch
and we will know it is safe to delete our label branch
. But if they squashed instead of merging, our Git won't drop our C-D-E
. It will instead try to apply them (with git apply
or with git cherry-pick
) one at a time. If we are lucky, it will discover that each one reduces to nothing at all, and after three manual "skip" steps we will get this:
...--A--B--F--CDE--G <-- branch, origin/branch
If we are not lucky, we will get merge conflicts and have to live through Merge Hell until we realize that, oh hey, commit CDE
equals the summary of our three commits and we should just drop them.
2Our Git will figure this out on its own if and only if the patch IDs (see the git patch-id
documentation) match.
Method 2: merge
We could try merging. This relies on merge bases and endpoints. The merge base of our branch
and our origin/branch
is commit B
. Our Git will diff B
vs E
, and B
vs G
. Then our Git will try to combine the two sets of changes.
The resulting merged files will either match G
, in which case everything in our C-D-E
is already included, or will not, in which case ... well, there are two possibilities.
Maybe G
deliberately undid something we did—perhaps G
is a revert of D
, for instance, or a partial revert. Let's say we added the line Woohoo!
in the middle of file README.txt
. Someone took it back out because it was inappropriate, so now README.txt
in commit G
matches README.txt
in commit B
. D'oh!
Well, when Git compares B
vs G
, it won't see the added line. When Git compares B
vs E
, it will see the added line. So Git will put it in, thinking we want it. Woohoo! But maybe we did not want it after all. D'oh!
All in all, though, it looks like merging is a better strategy than rebasing, because it handles both the squashed case (C-D-E
in branch
becomes CDE
in origin/branch
) and the rebased case (C-D-E
in branch
becomes C'-D'-E'
in origin/branch
). But it leaves us with an annoying merge commit. If ???
is the mystery sequence that may include CDE
or C'-D'-E'
, we go from this:
...--A--B---???---G <-- origin/branch
\
C--D--E <-- branch
to this:
...--A--B---???---G <-- origin/branch
\ \
C--D--E--M <-- branch
which leaves us a merge commit M
. We can now compare M
vs G
to see if there's an extra line in README.txt
or whatever (woohoo!), and based on that, decide whether to delete branch
. If the two match exactly, it's safe enough to delete branch
, as long as we don't care about the precise details of the C-D-E
sequence. If not, we must think about the difference.
Method 3: squash
Instead of making regular merge M
, we could just squash. This uses the merge machinery in the same way, but then:
- Forces us to make the final commit, as if we had run
git merge --no-commit
.
- Makes that last commit, once we run
git commit
, as a regular, non-merge commit.
That is, we get the exact same tree as with Method 2 (merge), but a different commit graph:
...--A--B---???---G <-- origin/branch
\
C--D--E--S <-- branch
As before, we just want to git diff
the two commits—the two branch tips—and see if there is extra stuff like our README.txt
change (now we must think about it) or not (now we can safely delete branch
).
Conclusion
There's no particular reason to prefer any of these, except that the merge or squash method makes everything happen in one step, and works when the people running the upstream repository also squashed. Use whatever works best for you.
None will make the entire problem go away, because, well, "Woohoo! D'oh!" What will make the whole problem go away is if your upstream people, the ones running the repository on origin
, tell you when they have copied or squashed your commits (assuming, of course, you trust / can believe them).