26

In a prior question someone provided an answer for finding branches that contained an EXACT commit:

How to list branches that contain a given commit

The accepted answer highlighted that this only works for an EXACT commit id, and not for an identical commit. It was stated further that Git Cherry can be used to solve this.

Git cherry SEEMS to be geared for the reverse; finding commits NOT pushed upstream. This is useless if I don't know which branch created it and what is upstream of what. So I don't see how it's going to help solve this problem.

Can someone explain / provide an example of how to use git cherry to find all branches that contain the 'equivalent' of a specific commit?

Community
  • 1
  • 1
UpAndAdam
  • 4,155
  • 2
  • 25
  • 41
  • 2
    I would recommend writing a script that uses `git rev-list` and `git patch-id` to determine that. You might additionally also want to parse the annotations that `git cherry-pick` leaves in commit messages, as the patch-id (also the basis of `git cherry`) is not perfect and will break if you resolved any conflicts. – Chronial Apr 30 '13 at 17:11
  • Didn't mean to +1 that comment: I have no idea what you said or how it helps me. What do I use them to generate a list of? Lets assume someone's resolved conflicts and that my teammates aren't smart enough to use cherrypick -x since I had to point it out to them. – UpAndAdam May 01 '13 at 00:30
  • 1
    Well, then you’re screwed. If you want to know exactly which branches contain which commits, only use merges and never cherry-pick. You could assume that commits with the same commit message are probably the same and do some funky heuristics on their diff to validate that assumption. But that process will be error-prone and you would have to code that yourself. – Chronial May 01 '13 at 00:35
  • Sounds like the poster from the other page is mistaken then. This was mostly out of curiosity and to validate my decision to force teammates to rebase before requesting a merge so that all of our merges are FF and thus no cherry picks and no conflicts etc etc. I appreciate your post! Also now I mean both +1's :-) – UpAndAdam May 01 '13 at 00:40
  • Thanks, but why would they need to rebase before merging? Just merging will do just fine and requires a lot less conflict resolution. Also note that if you rebase published branches, you will of course end up with a similar problem to the one you have with cherry-picking. – Chronial May 01 '13 at 01:42
  • I don't rebase published branches I have them rebased before a merge pull request Puts responsibility for ensuring merge is clean on the person contributing code not on the gatekeeper. That way I can focus on reviewing the actual changes to our code they are making. Secondly because it results In a simple linear tree and history where commit ids are consistent everywhere. Branches are cheap new ones can be created at will. – UpAndAdam May 01 '13 at 03:02
  • 3
    This is OT, but I think still important: This seems to be a common misconception about rebase. If you want the branch to be easy to merge, just have the owner merge master into it. When you rebase, you don’t just end up with a linear history, you also end up with a lie. The in-between commits might not even work anymore, because nobody tests during rebase. That might make your history quite useless. A real merge on the other hand tells you which parts were developed independently, which is also useful to know. Git is not svn, git does not need linear at all. – Chronial May 01 '13 at 12:08
  • See also here: http://geekblog.oneandoneis2.org/index.php/2013/04/30/please-stay-away-from-rebase – Chronial May 01 '13 at 12:39
  • that's actually an extremely important point – UpAndAdam May 01 '13 at 14:43
  • @Chronial fwiw I now agree with you completely; the rebase model becomes an utter nighmare for everyone and just gets in the way. Requiring peole to regularly force push is a recipe for disaster. I only really like the rebase usage for when you have to 'split' a branch apart or if you want to before review condense and organize commits to make them more digestable. The problem was really poor decisions about branch flow combined with poor user skills of others that led to repeated mistakes that were avoided by rebase. But it doesn't fix the problem, and causes more problems than it solves – UpAndAdam Sep 23 '16 at 17:57
  • One possibility is to compare by only the subject of commits, so that "nearly-equivalent" commits show up (that wouldn't even with `git cherry`). `git log` with a diff viewer is an option in that case: https://stackoverflow.com/a/46127413/1959808 – Ioannis Filippidis Sep 09 '17 at 05:34
  • @IoannisFilippidis That is not an option I would suggest to anyone. Relying solely upon the commit message and ignoring actual content is 100% wrong here. The whole point here is to find identical patches regardless of what the message is. People will cherry pick, write an entirely different commit message, or unknowingly write the same patch, or use the same commit message over and over. Content matters, commit messages can't be trusted for diffing content.. So no git log will not suffice at all. And since the MANY branches could vary wildly what there is nothing do look for in the diff – UpAndAdam Sep 12 '17 at 01:59
  • @UpAndAdam I agree, thanks for pointing this out. I should restrict my suggestion to only two branches that the user knows differ only by rebase, and one of them has improved commits over the other. My use case was that I had rebased a branch onto `master`, and changed commits "won" over the old branch. What I was looking for were omitted commits (experiments) to salvage by cherry-picking, before deleting the old branch. So commit messages sufficed in my case. In fact, I *wanted* to use the commit message as defining an equivalence relation, ignoring minor edits inside the commit. – Ioannis Filippidis Sep 12 '17 at 02:17
  • @IoannisFilippidis I have no idea what you mean by only differ by rebase to be honest, but its very context sensitive becuase it requires knowing where you branched from etc etc. i see remotely what you are getting at but to me it seems much simpler to just do the rebase and the minor edits / duplicate commits will get discarded or yield a conflict where you can skip them or take them... good luck with whatever yuo are trying to do :-) – UpAndAdam Sep 13 '17 at 21:04

4 Answers4

20

Before you can answer the question of which branches contain an equivalent commit you have to determine "which commits are equivalent". Once you have that, you simply use git branch --contains on each of the commits.

Unfortunately, there is no 100% reliable way to determine equivalent commits.

The most reliable method is to check the patch id of the changeset introduced by the commit. This is what git cherry, git log --cherry, and git log --cherry-mark rely on. Internally, they all call git patch-id. A patch id is just the SHA1 of the normalized diff of changes. Any commit that introduces identical changes will have the same patch id. Additionally, any commit that introduces mostly identical changes that differ only in whitespace or the line number where they apply in the file will have the same patch id. If two commits have the same Patch ID, it is almost guaranteed that they are equivalent - you will virtually never get a false positive via the patch id. False negatives occur frequently though. Any time you do git cherry-pick and have to manually resolve merge-conflicts you probably introduced differences in the changeset. Even a 1 character change will cause a different patch id to be generated.

Checking patch ID requires scripting as Chronial advises. First calculate the patch id of the Original Commit with something like

(note - scripts not tested, should be reasonably close to working though)

origCommitPatchId=$(git diff ORIG_COMMIT^! | git patch-id | awk '{print $1}')

Now you are going to have to search through all the other commits in your history and calculate the Patch IDs for them, and see if any of them are the same.

for rev in $(git rev-list --all)
do
   testPatchId=$(git diff ${rev}^1..${rev} | git patch-id | awk '{print $1}')
   if [ "${origCommitPatchId}" = "${testPatchId}" ]; then
      echo "${rev}"
   fi
done

Now you have the list of SHAs, and you can pass those to git branch -a --contains

What if the above doesn't work for you though, because of merge conflicts?

Well, there are a few other things you can try. Typically when you cherry-pick a commit the original author name, email, and date fields in the commit are preserved. So you will get a new commit, but the authorship information will be identical.

So you could get this info from your original commit with

git log -1 --pretty="%an %ae %ad" ORIG_COMMIT

Then as before you would have to go through every commit in your history, print that same information out and compare. That might give you some additional matches.

You could also use git log --grep=ORIG_COMMIT which would find any commits that references the ORIG_COMMIT in the commit message.

If none of those work you could attempt to look for a particular line that was introduced with the pickaxe, or could git log --grep for something else that might have been unique in the commit message.

If this all sounds complicated, well, it is. That's why I tell people to avoid using cherry-pick whenever possible. git branch --contains is incredibly valuable and easy to use and 100% reliable. None of the other solutions even come close.

fcrick
  • 427
  • 4
  • 11
Andrew C
  • 11,594
  • 4
  • 44
  • 52
8

The following seems to work (but hasn't been tested much). It runs git cherry for each local git branch, and prints the branch name if git cherry doesn't list the commit as missing from the branch.

# USAGE: git-cherry-contains <commit> [refs]
# Prints each local branch containing an equivalent commit.
git-cherry-contains() {
    local sha; sha=$(git rev-parse --verify "$1") || return 1
    local refs; refs=${2:-refs/heads/}
    local branch
    while IFS= read -r branch; do
        if ! git cherry "$branch" "$sha" "$sha^" | grep -qE "^\+ $sha"; then
            echo "$branch"
        fi
    done < <(git for-each-ref --format='%(refname:short)' $refs)
}

See Andrew C's post for a great explanation of how git cherry actually works (using git patch-id).

odinho - Velmont
  • 18,214
  • 6
  • 38
  • 30
John Mellor
  • 11,028
  • 4
  • 41
  • 32
  • 1
    Thanks! The script works well, updated it to also allow specifying the ref folder to search (so we can search through upstream refs): `git-cherry-contains 0adbcd refs/remotes/origin/`. – odinho - Velmont Dec 19 '17 at 13:07
  • Just what I was looking for. Here's an alias equivalent for your .gitconfig file: `cherry-contains = "!f(){ local sha=$(git rev-parse --verify \"$1\") || return 1; local refs=${2:-refs/heads/}; local branch; git for-each-ref --format='%(refname:short)' $refs | while IFS= read -r branch; do if ! git cherry \"$branch\" \"$sha\" \"$sha^\" | grep -qE \"^\\+ $sha\"; then echo \"$branch\"; fi; done; };f"` – Andy Sep 02 '20 at 18:53
1

Command

Use the following Bash command (replace <COMMIT HASH> with the commit hash you are searching for):

PATCH_ID=$(git show <COMMIT HASH> | git patch-id | cut -d' ' -f1) \
&& ALL_MATCHING_COMMIT_HASHES=$(git log --all -p | git patch-id | grep $PATCH_ID | cut -d' ' -f2) \
&& for HASH in $ALL_MATCHING_COMMIT_HASHES; do echo "$(git branch -a --contains $HASH) (commit $HASH)"; done 

Example output

user@host test_cherry_picking $ PATCH_ID=$(git show 59faabb91cfc8e449737f93be8c7df3825491674 | git patch-id | cut -d' ' -f1) \
&& ALL_MATCHING_COMMIT_HASHES=$(git log --all -p | git patch-id | grep $PATCH_ID | cut -d' ' -f2) \
&& for HASH in $ALL_MATCHING_COMMIT_HASHES; do echo "$(git branch -a --contains $HASH) (commit $HASH)"; done

* hotfix (commit 59faabb91cfc8e449737f93be8c7df3825491674)
master (commit bb5fa0d16931fa1d5fa9f5e9ee5c27634fad7da8)

user@host test_cherry_picking $

Description

Calculates the PATCH ID for a given GIT REVISION PARAMETER (e.g. the hash of a commit). Then finds all commits with the calculated PATCH ID. Finally all branch names which contain these commits are printed to the console.

This of course only works as long as the PATCH ID is the same for all (cherry-picked) commits. Any time you cherry-pick and have to manually resolve merge-conflicts you probably introduce differences in the changeset. This will lead to different PATCH IDs.

  • I have found that `git show ` does NOT yield identical patch-id output for the same cherry-picked change. Using advice from https://stackoverflow.com/a/45373500/150447 I tried `git diff-tree -p` as a substitute, and I get deterministic identical results. – Chris Cleeland Jan 23 '19 at 16:00
0
$ for i in `git rev-list --all --grep="something unique in the commit message"`; do git branch --all --contains $i; done | sort | uniq
solstice333
  • 2,617
  • 19
  • 26