Delete all git branches which don't add diff over master

Question

To take this question a level further - is there a way of deleting all branches that would have no diff if rebased on master.

While the answer to that question works for a standard merge workflow, it doesn't pick up branches that have been 'Squashed & Merged' upstream. Because those branches have been squashed before merging, they don't have the same commit hashes, and so git doesn't realize that they have 'effectively' been merged:

$ git branch feature -d
error: The branch 'feature' is not fully merged.
If you are sure you want to delete it, run 'git branch -D feature.

But if we rebase the branch first, it deletes them without complaint:

$ git checkout feature
Switched to branch 'feature'
Your branch is based on 'origin/feature', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup) 

$ git rebase develop
First, rewinding head to replay your work on top of it...

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

$ git branch -d feature
Deleted branch feature (was 72214c7).

So, is there a way of a) scanning the branches and checking which ones are safe to delete, and b) deleting those

That question seems to supply your answer. What do you mean by "squashed & merged" upstream? Could you give an example? — Schwern, Nov 23 '16 at 18:52
If you do [this](https://github.com/blog/2141-squash-your-commits) upstream, the local branch doesn't identify as being merged, even though when rebased it would have no diff. Does that make sense? — Maximilian, Nov 23 '16 at 19:09
@Schwern: I think Maximilian means using a GitHub button that does the equivalent of `git merge --squash && git commit ...`. — torek, Nov 23 '16 at 21:10
@Schwern - thanks for your initial responses and apologies if I wasn't clearer prior. I've added some detail now — Maximilian, Nov 23 '16 at 22:23

score 2 · Accepted Answer · answered Aug 01 '18 at 01:11

This library offers a way of doing this: https://github.com/not-an-aardvark/git-delete-squashed

...using this bash script:

git checkout -q master && git for-each-ref refs/heads/ "--format=%(refname:short)" | while read branch; do mergeBase=$(git merge-base master $branch) && [[ $(git cherry master $(git commit-tree $(git rev-parse $branch^{tree}) -p $mergeBase -m _)) == "-"* ]] && git branch -D $branch; done

score 1 · Answer 2 · answered Nov 23 '16 at 22:22

You specifically mentioned using rebase, which makes the problem more difficult in the presence of squash "merges". There is an easier way to try it, which may suffice, or may not.

Let's take a look at a workflow that sometimes uses squash "merge"¹ instead of actual merges or rebasing, and maybe sometimes uses rebasing.

We start with some chain of commits in some repository, probably a centralized one on some big server we'll call origin:

...--A--B   <-- branch

If it's centralized on the server, we clone it and wind up calling this origin/branch in our own repository:

...--A--B   <-- origin/branch

Now we git checkout branch and start working, and we make some new commit(s) of our own:

...--A--B   <-- origin/branch
         \
          C--D   <-- branch

Maybe ours aren't enough, or we get feedback when we push commit-chain C--D and make a pull request out of it, so we add another commit or two:

...--A--B   <-- origin/branch
         \
          C--D--E   <-- branch

While all this is going on, the repository on origin is potentially also acquiring new commits, so that by the time we're really ready and have pushed C-D-E as a pull request, we might even have this on origin:

...--A--B--F   <-- branch

but in any case what happens now is that whoever controls origin (perhaps directly, as through a clicky GUI interface on GitHub, or perhaps indirectly through their own repository) eventually takes our C-D-E chain and puts it into the repository on origin, but does so by making a new, single commit—let's call this CDE to show that it does what C-D-E does—and putting that into the sequence on origin, so that they now have:

...--A--B--CDE--F--G   <-- branch

or:

...--A--B--F--CDE--G   <-- branch

or similar.

We now git fetch this to bring our repository up to date, giving us:

...--A--B--F--CDE--G   <-- origin/branch
         \
          C--D--E   <-- branch

or similar.

On the other hand, maybe the keeper of origin keeps our individual commits but rebases them him- or her-self, so that the upstream now has:

...--A--B--F--C'--D'--E'--G   <-- branch

and we wind up with:

...--A--B--F--C'--D'--E'--G   <-- origin/branch
          \
           C--D--E   <-- branch

¹I like to put quotes around "merge" for git merge --squash since this uses the merge machinery, but does not make a merge commit. Using git revert, git cherry-pick, and even git apply, we often wind up using the merge machinery, but people don't call those "merges"! There seems to be something about the fact that the top level command is spelled git merge --squash that leads people to call this "merging". Perhaps if the top level command were git gimme people would call this "gimming"? :-)

Since "squash" is a perfectly good verb of its own, though, I think it would be nice to just call this "squashing", and refer to these as commits that have been "squashed".

The ultimate goal

The goal here is to delete our branch branch if and only if there's some commit sequence CDE or C'-D-E' or some such, in our upstream origin/branch, that means that our original C-D-E chain is no longer needed.

The problem is that we don't know what the person or people controlling the upstream have done, because they never told us. (How rude! :-) )

There are any number of things we can try.

Method 1: rebase

We could try just running git rebase, rebasing our branch onto our origin/branch. If—this is a big "if"—they, whoever they are, actually copied our C-D-E chain to a C'-D'-E' chain, our Git will probably² find that the upstream origin/branch has our three commits, and will therefore drop them from our rebase. If it does, we will get this:

...--A--B--F--C'--D'--E'--G   <-- branch, origin/branch

and we will know it is safe to delete our label branch. But if they squashed instead of merging, our Git won't drop our C-D-E. It will instead try to apply them (with git apply or with git cherry-pick) one at a time. If we are lucky, it will discover that each one reduces to nothing at all, and after three manual "skip" steps we will get this:

...--A--B--F--CDE--G   <-- branch, origin/branch

If we are not lucky, we will get merge conflicts and have to live through Merge Hell until we realize that, oh hey, commit CDE equals the summary of our three commits and we should just drop them.

²Our Git will figure this out on its own if and only if the patch IDs (see the git patch-id documentation) match.

Method 2: merge

We could try merging. This relies on merge bases and endpoints. The merge base of our branch and our origin/branch is commit B. Our Git will diff B vs E, and B vs G. Then our Git will try to combine the two sets of changes.

The resulting merged files will either match G, in which case everything in our C-D-E is already included, or will not, in which case ... well, there are two possibilities.

Maybe G deliberately undid something we did—perhaps G is a revert of D, for instance, or a partial revert. Let's say we added the line Woohoo! in the middle of file README.txt. Someone took it back out because it was inappropriate, so now README.txt in commit G matches README.txt in commit B. D'oh!

Well, when Git compares B vs G, it won't see the added line. When Git compares B vs E, it will see the added line. So Git will put it in, thinking we want it. Woohoo! But maybe we did not want it after all. D'oh!

All in all, though, it looks like merging is a better strategy than rebasing, because it handles both the squashed case (C-D-E in branch becomes CDE in origin/branch) and the rebased case (C-D-E in branch becomes C'-D'-E' in origin/branch). But it leaves us with an annoying merge commit. If ??? is the mystery sequence that may include CDE or C'-D'-E', we go from this:

...--A--B---???---G   <-- origin/branch
          \
           C--D--E   <-- branch

to this:

...--A--B---???---G   <-- origin/branch
          \        \
           C--D--E--M  <-- branch

which leaves us a merge commit M. We can now compare M vs G to see if there's an extra line in README.txt or whatever (woohoo!), and based on that, decide whether to delete branch. If the two match exactly, it's safe enough to delete branch, as long as we don't care about the precise details of the C-D-E sequence. If not, we must think about the difference.

Method 3: squash

Instead of making regular merge M, we could just squash. This uses the merge machinery in the same way, but then:

Forces us to make the final commit, as if we had run git merge --no-commit.
Makes that last commit, once we run git commit, as a regular, non-merge commit.

That is, we get the exact same tree as with Method 2 (merge), but a different commit graph:

...--A--B---???---G   <-- origin/branch
          \
           C--D--E--S  <-- branch

As before, we just want to git diff the two commits—the two branch tips—and see if there is extra stuff like our README.txt change (now we must think about it) or not (now we can safely delete branch).

Conclusion

There's no particular reason to prefer any of these, except that the merge or squash method makes everything happen in one step, and works when the people running the upstream repository also squashed. Use whatever works best for you.

None will make the entire problem go away, because, well, "Woohoo! D'oh!" What will make the whole problem go away is if your upstream people, the ones running the repository on origin, tell you when they have copied or squashed your commits (assuming, of course, you trust / can believe them).

This is awesome and I really appreciate all the effort! I'm guessing the answer is 'no', because of how complicated it is to do it for a single branch - but is it possible to script a test and run through all the branches? 1&2 seem the safest — Maximilian, Nov 23 '16 at 22:27
You could write a script that does a test rebase, squash, or merge, and tests the result (and optionally uses `git reset --hard` to undo it all if the result is "this looks iffy"), then write a second script that loops over specified branches and runs the first one. (Or embed the first part in a shell function, and write a second function / script that invokes the first function.) "Undoing" a merge or squash is easy: reset to `HEAD^`. Undoing a rebase is harder unless it was very recent, in which case the previous value of the branch is available as `ORIG_HEAD` or `branch@{1}`. — torek, Nov 23 '16 at 22:45

Maximilian · Answer 3 · 2018-02-02T06:15:06.423

I'm neither a git nor a bash expert, so I imagine there are bad practices here, but I have a partial solution. This works for the case where the branches have the same tree (even if not the same commits):

git for-each-ref --shell --format='if [[ -z $(git diff master..%(refname)) ]]; then echo %(refname:short); fi' refs/heads/ | sh | grep -v 'master' | xargs git br -D

It runs through each branch and, if the diff to master is empty, it deletes that branch. That works even if they don't share the same commits.

Open to suggestions on how to improve this

score -1 · Answer 4 · answered Nov 24 '16 at 07:35

If I understand what's going on, you're using git merge --squash. That's a problem. There's a way better alternative.

git merge --squash loses important archeological history. Instead of being able to examine a chain of small changes with precise log messages, now the code archeologist has the entire diff of the branch with all log messages smashed together. You can't know that if the change on line 185 was for "fixing a Unicode bug" or "changing how we do refunds". This will make working on your code in the future much harder.

The primary reason to use git merge --squash is to have a nice linear history. This is nice for people who don't really understand Git's graph nature. You can achieve this while still retaining complete history of each branch.

Assuming you you're merging feature onto master...

A - B - C - G - H [master]
         \ 
          D - E - F [feature]

git checkout feature
git rebase master

A - B - C - G - H [master]
                 \ 
                  D1 - E1 - F1 [feature]

Now feature is up-to-date with master. You can do this as many times as you want, it's the "update" step. Rebasing instead of merging avoids having a lot of uninteresting update merge commits that confuse reading the history.

When you're done with feature and it's ready to merge...

Test feature.
git checkout master
git merge --no-ff feature

git merge --no-ff prevents a fast-forward and always creates a merge commit. The result is this.

A - B - C - G - H ------------- I [master]
                 \            /
                  D1 - E1 - F1 [feature]

git log remains linear, it will read I, F1, E1, D1, H, G, C, B, A. That will make the non-Git people happy. Archeologists will be happy because the full history of each change in the branch is retained. QA and other devs will be happy because the feature branch is fully tested in its final form before it's merged into master avoiding inflicting possible bugs from the merge on everyone.

There's no need to retest master because F1 and I have the same content... but your CI server can go ahead and do it anyway.

git branch -d feature

Now you can safely and immediately delete the feature branch. Merge commit I will retain the name of the branch and any extra information you might want to keep about it like a link to the issue tracker.

OK, those arguments make sense, and I appreciate you taking the time. But GitHub still offer 'Squash & Merge' (maybe they're really foolish, maybe not), and so the question still stands. — Maximilian, Nov 27 '16 at 22:25
@Maximilian Just cuz there's a button doesn't mean you have to press it. Delete the branch immediately after you merge it. Rely on `git reflog` in case of a mistake in the merge. That's good general advice. — Schwern, Nov 27 '16 at 23:57