3

Hat tip to Randy Fay's "Gory Story" about git, which made me remember the case below (which is based on a true story) and the unresolved question it leads to.

The question is: Is there any git command (or shell script) to find merge commits which discard all changes to a file from one of the branches, and just includes the changes from the other?

Note: "The other" branch might or might not change the file in question. If there is no solution for the general case where both branches change the file, I'd also be happy with a solution which works if "the other" branch didn't change the file content (i.e. the file content is taken from the common ancestor of both branches), as shown in the example below.

For clarification, please read the following scenario.

What happened so far...

Here we have a simple git history. You can clone it from github.

       e---f
      /     \
-a---b---c---d--...-x-y-z    <-master <-bob
          \
           g---h  <-alice

Both Alice* and Bob* are working on the project. They are not git professionals but had a simple git introduction by their company.

This is how the history came to pass:

  • Alice pushes c which changes main.c, feature.c and feature_test.c. Then she continues working on feature.c in g und h (locally).
  • Bob commits e and f with conflicting changes in main.c.
  • Bob does git pull and gets a merge conflict, so he talks to Alice.
  • They agree to go with Bob's change and discard Alice's.
  • Bob somewhere heard something about ours doing that, so he enters git merge ours into his favorite search engine and finds an SO page git merge -s ours.
  • Bob executes git merge -s ours master. The merge applies cleanly and main.c looks right. He didn't touch feature.c and doesn't know it, so he leaves it alone. So he's happy and pushes the merge to upstream/master.
    • Note, Bob has just undone Alice's work on feature.c. He has also deleted the test cases, which Alice added in feature_test.c, so he won't even get a failing test to alert him to the problem.
  • Development continues from there on, people commit, push and merge happily, until...
  • Alice discovers the problem when she tries to merge commits g and h into master, because then her new changes tofeature.c will not apply cleanly onto the old feature.c from before commit c.

Diagnosis

I can tell you, Alice won't have an easy time finding out where her merge conflicts came from. And when she finds it, she's not going to be pleased about Bob.

It's not easy to diagnose where her merge conflict came from, because Alice can easily see in the history that nobody but her ever touched these files. For example git log -- feature.c will only show commit b by Alice (which created the file), but neither c nor d. In this example, alice might notice the absence of c in that list, but in a real project, spotting a single missing commit among dozens is not that easy.

Her next step might be git bisect which will tell her that commit d introduced the problem. Thus, she inspects the commit:

$ git show --patch d
commit a69d49a390b9e92a0fcd60f0396d08a4b839a8c1
Merge: fb1a69b 6ad85a1
Author: Bob <bob@x.com>
Date:   Tue May 17 15:20:37 2016 +0200

    Merge branch 'master' into bob

$

What? How does commit d break feature.c if it doesn't even contain any changes and much less touches the file feature.c? However, Alice remembers that she can show the difference between two commits, so she decides to look at the changes that d introduced relative to its two parents d^1 (aka f) and d^2 (aka c).

$ git diff d^1 d
$ git diff d^2 d
diff --git a/feature.c b/feature.c
index 2a8efa8..621e63a 100644
--- a/feature.c
+++ b/feature.c
@@ -1,4 +1,2 @@
 feature 1
-modified feature 2
-feature 3
-feature 4
+feature 2
diff --git a/feature_test.c b/feature_test.c
index fbc719a..3d14bd1 100644
--- a/feature_test.c
+++ b/feature_test.c
@@ -1,4 +1,2 @@
 test feature 1
-modified test feature 2
-test feature 3
-test feature 4
+test feature 2
diff --git a/main.c b/main.c
index e2f8683..2cd1198 100644
--- a/main.c
+++ b/main.c
@@ -1,6 +1,8 @@
 line 1
 line 2
-alice 3
-alice 4
+bob 3
+bob 4
 line 5
 line 6
+bob 7
+non 8
$ 

Aha! So this way you can see what a merge commit actually did. It looks like d is actually exactly the same as f, and all changes in all files from c are undone.

Now, one might argue that this is all Bob's fault for using git commands without completely understanding them. However, neither is he the first to do that, nor does blaming him really help anyone.

What matters is: Are there other such cases in the project's history where an inconspicuous merge commit silently undoes the work from a whole branch? This would be especially "great" for a branch which fixed a subtle bug that occurs only once every dew months and took ages to find to fix.

In the case above, the file feature.c was changed in only one of the branches. So after the merge, feature.c is reset to the version from the common ancestor (git merge-base f c). The above might be even harder to detect if Bob had also done a non-conflicting change to feature.c, so that it is different from the common ancestor. This is what I referred to as the "general case" in the introduction.

* Names changed.

Community
  • 1
  • 1
Fritz
  • 928
  • 10
  • 26
  • Duplicate of http://stackoverflow.com/questions/27683077/how-do-you-detect-an-evil-merge-in-git/27804868 – jthill May 19 '16 at 20:20

1 Answers1

3

First, let's assume for now that the "bad" merges we are looking for have only 2 parents (octopus merges can have > 2 parents).

If a merge commit clobbers the changes of one of the branches, then we know that the head of the branch which has kept its changes will be in exactly the same state as the merge commit, since no other changes were added in the merge commit. This means the diff between the merge commit and the "successful" parent of the merge commit will be empty.

However this can also be the case when you merge branch A into branch B, where branch B contains no extra commits not contained in branch A (which means HEAD^1 == merge-base). We can check for this condition and ignore such merges.

Here is a script which will print out a list of commit refs of merge commits which clobber either side of differing branches, out of all commits in the current checked-out branch:

#!/bin/bash

# Loop through each commit
for ref in $(git rev-list HEAD); do
    # Get a list of the commit's parents
    parents="$(git log --pretty=%P -n 1 $ref)"

    # Merge commits have 2 or more parents.  We'll ignore merge commits with
    # more than 2 parents for now.
    if [ $(echo "$parents" | wc -w) -eq 2 ]; then
        merge_base="$(git merge-base $ref^{1..2})"
        merge_diverging=1
        merge_unchanging=0

        # Loop through ref^1 and ref^2 (the two parents)
        for i in 1 2; do
            parent_ref="$(git rev-parse $ref^$i)"
            diff_lines="$(git diff $parent_ref $ref | wc -l)"

            # Check for merge commits where one parent is the base of the
            # merge so we can ignore them, since a merge which doesn't diverge
            # (i.e. made with `git merge --no-ff`) will have an empty diff on
            # one side
            if [ "$merge_base" == "$parent_ref" ]; then
                merge_diverging=0
            fi

            # Find if either side of the merge has an empty diff
            if [ $diff_lines -eq 0 ]; then
                merge_unchanging=1
            fi
        done

        if [ $merge_diverging -eq 1 ] && [ $merge_unchanging -eq 1 ]; then
            echo $ref
        fi
    fi
done
Candy Gumdrop
  • 2,651
  • 1
  • 16
  • 16
  • Instead of taking diff you could compare tree hashes (take them as `git rev-parse "$ref^{tree}"`). Should be faster. – max630 May 19 '16 at 19:00
  • @max630 That wouldn't work, since the hashes are hashes of more than just the current state of the tree, otherwise the merge commit and its identical parent would be indistinguishable. When you do a `--no-ff` merge to bring a branch which doesn't diverge, you still have a merge commit with a different hash. – Candy Gumdrop May 19 '16 at 19:17
  • Great answer! Do you also have an idea how to find commits which drop only specific files from one branch, instead of the whole branch? Or, to go even further, which drop only some diff hunks from one branch (as explained in the question which is linked as a possible duplicate above)? – Fritz May 25 '16 at 20:25