-1

On the master branch I have a file(let's called it file.txt) which uses the fields of a structure with the old names(let's called them : old_field1, old_field2 and old_field3). On my development branch I have the same file, but another version of it which uses the same fields, but with different names(let's called them : new_field1, new_field2 and new_field3)

I merged the master branch into my development branch and I had a git conflict due to the different names of those fields. I fixed it by keeping the new names. Later on I merged the development branch into the master branch and I had the same git conflict.

My questions are :

  1. Why did I have the same conflict if I fixed it? How should I avoid it in the future?
  2. What is a git conflict?
  3. How does the 'merge' process work ? (e.g. : it applies patches in a certain order etc)
vishnu narayanan
  • 3,335
  • 2
  • 18
  • 25
  • Could you share more details about your branches ? Was the dev branch checked out from master? Was other branches merged to master before you tried to merge master onto your dev ? – vishnu narayanan Aug 04 '19 at 12:31
  • Yes, my development branch was checked out from master and for sure other branches were merged to master. – Henrik Joe Aug 04 '19 at 14:16
  • Here is a useful tutorial on git merge conflicts: https://www.atlassian.com/git/tutorials/using-branches/merge-conflicts – joanis Aug 04 '19 at 17:28
  • And here is a question on this site on the topic: https://stackoverflow.com/questions/4920885/what-constitutes-a-merge-conflict-in-git – joanis Aug 04 '19 at 17:29
  • Possible duplicate of [What constitutes a merge conflict in Git?](https://stackoverflow.com/questions/4920885/what-constitutes-a-merge-conflict-in-git) – joanis Aug 04 '19 at 17:31
  • Finally, I wrote up an answer on explaining unexpected merge results in the past: https://stackoverflow.com/a/54481842/3216427 – joanis Aug 04 '19 at 17:37

1 Answers1

1

You're starting from a fairly bad place in your question, or maybe chose a particularly unlucky example, as shown by #3 here:

How does the 'merge' process work ? (e.g. : it applies patches in a certain order etc)

because git merge does not (quite) apply patches. But you want to know what makes a conflict occur, so you need to understand a lot of things about Git.

Commits are snapshots, plus a bit more

First, let's look as briefly as possible at commits. You already know that to make new commits, you git checkout a branch, do some work on it, git add, and git commit. You may or may not know, though, that each commit holds a full and complete snapshot of all of your files.

This might seem odd, given that git show shows a commit as a patch or change-set, and git log -p shows each a patch for each commit.1 But that's because a commit doesn't just store a snapshot. If you run git log, with or without -p, you get more information for each commit. For instance:

$ git log | head -7 | sed 's/@/ /'
commit 7c20df84bd21ec0215358381844274fa10515017
Author: Junio C Hamano <gitster pobox.com>
Date:   Fri Aug 2 13:12:24 2019 -0700

    Git 2.23-rc1

    Signed-off-by: Junio C Hamano <gitster pobox.com>

So a commit stores not only the snapshot, but also some name and email address stuff and so on.

Note the big ugly hash ID, 7c20df84bd21ec0215358381844274fa10515017. This is, in effect, the true name of the commit. This one particular commit will always be 7c20df84bd21ec0215358381844274fa10515017. It will be 7c20df84bd21ec0215358381844274fa10515017 in my clone of this Git repo, and in your clone of this same repo (https://github.com/git/git), and in the GitHub clone, and in the Git peoples' clones, and so on. We can also look directly at the raw contents of this commit:

$ git cat-file -p 7c20df84bd21ec0215358381844274fa10515017 | sed 's/@/ /'
tree 8858576e734aa4f1cd9b45e207e7ee2937488d13
parent 14fe4af084071803ab4f16e6841ff64ba7351071
author Junio C Hamano <gitster pobox.com> 1564776744 -0700
committer Junio C Hamano <gitster pobox.com> 1564776744 -0700

Git 2.23-rc1

Signed-off-by: Junio C Hamano <gitster pobox.com>

That's actually the entire internal Git commit object, right there: the snapshot is held indirectly, through the first line that says tree and has another big ugly hash ID. The rest of the lines—parent, author, committer, and the log message that Junio Hamano typed in when he made this commit, make up the rest of this commit.

Note closely the parent line. This has another big ugly hash ID. You can look at that commit directly if you like: clone the repository and git cat-file -p that hash ID. You'll see that this one has another tree line—that's this other commit's snapshot—and more parent and author and so on lines. This next commit—which is really the previous commit—actually has two parent lines, because commit 14fe4af084071803ab4f16e6841ff64ba7351071 is a merge commit.

These various parent lines string commits together, by their hash IDs, in backwards order. Each commit has a true-name hash ID, and each commit has some number of parent lines.2 Most commits have exactly one of these lines. That one line give the hash ID—the true name—of the commit's parent commit.

Once a commit is made, it is frozen forever. So it's impossible to reach back up into the parent and add the hash ID of the child, when you later make a new child commit. That's why each commit only knows its parents: they exist at the time the commit is born, and as soon as the commit is born, it's frozen for all time. The commit gets its hash ID by being born, and the uniqueness of the hash ID is determined in part by the time, down to the second, that you make the commit (encoded into those author and committer lines—the date and time stamp on the one shown above is 1564776744 -0700, for both author and committer).

Note that commits, and their snapshots, are frozen forever. We can't get any work done with frozen stuff! So Git gives us a work area—a place that Git calls a work-tree or working tree or anything along these lines—where it expands out the frozen (and compressed) files from a commit. There's also a very important thing called the index or staging area (two names for the same thing) that I won't cover here, which sits "between" the commit you checked out, and the work-tree.


1Note that git log -p doesn't show a patch for merge commits, but git show does. There's a lot more to know here but for brevity we'll skip all of this.

2At least one commit has no parent, as we'll see in a moment. Most have one. Some—merge commits—have two or more; any commit with at least two parents is by definition a merge commit. More-than-two is rare and never, in a sense, necessary, though if you look through the Git repository for Git, you will find some. For instance, 89e4fcb0dd01b42e82b8f27f9a575111a26844df is one such.


Commits therefore form a particular kind of graph

Mathematically, a graph is defined by a pair of sets: G = (V, E) (see Wikipedia Article). In this case V—the set of vertices in the graph—is all of your commits, as found by their hash IDs, and E—the set of edges—comes from the parent lines. For the very simplest cases, though, we can just draw the graph, which I think is a lot more comprehensible. Let's use one-letter names for commits, to stand in for the big ugly hash IDs, and imagine we have a tiny repository with just three commits, all in a row:

A <-B <-C

Commit C is the last one we made. It remembers the hash ID for commit B, so B is C's parent. Meanwhile, B remembers the hash ID for commit A: A is B's parent. But commit A is the very first one we ever made, so it has no parent. In Git terms, it is a root commit. It has no parent because it can't have any parents: there were no commits before A existed.

Let's make a new commit now. It will get some random-looking hash ID, but we'll just call it D. The parent of D needs to be the commit that comes before D. But that's just commit C, of course. So D's parent will be C. The snapshot will be whatever we want it to be. The author and committer will be us, with "now" as the time-stamp, and we get to write up a log message. Git takes all of that stuff—the tree, the parent being C, our name and email and the times, and our log message—and writes them out as a new commit, acquiring some hash ID that we'll pretend is just D, and now we have:

A <-B <-C <-D

git show and git log -p use the graph to compare snapshots

Git can can compare the snapshot in C to the snapshot in D. If we do that, we'll see what we changed. That's what git log and git show do: given some commit, they look at the parent of that commit as well as at that commit. Whatever is different, that's what they show as your patch.

You can also use git diff to compare any two commits. For instance, you can compare the first commit ever, A, to the last one here, D, using git diff hash-of-A hash-of-D. Git extracts both snapshots, compares them, and tells you what's different.

Branch names find commits

So far, none of this is hard at all. Each new commit gets some big ugly random-looking hash ID. Each commit points back to its parent. No problem, eh? But wait: How do we remember the actual big ugly hash ID of the last commit? We need a place to stash that hash ID, because in a big repository we won't be able to just glance at every commit and all of their parent lines and so on and figure it out. So what Git does is this: it saves the hash ID of the last commit—C, and then D—in a name. Let's use the name master:

A--B--C--D   <-- master

The name master, in this case, just holds the actual raw hash ID of commit D—the one we just made. From D, Git can use its parent line to find C, and then use C's parent to find B, and so on. The action stops when Git finds A, which has no parent.

So a branch name just identifies the last commit in a branch. If we make new commit E, Git updates master by writing E's actual hash ID into the name master:

A--B--C--D--E   <-- master

and now we have five commits on master. We can keep going and eventually we have eight commits, all on master, like this:

...--F--G--H   <-- master

It's still easy, isn't it? Let's make it a little harder. :-) Let's create a new branch name, feature. How exactly do we do that? Well, we ask Git to do it using git branch or git checkout. Now, just like with master, Git has to store some hash ID into this new name. Which hash ID should it use? Git requires the hash ID of some existing commit.

Any of our eight commits, A through H, will do. We can pick one, but if we don't pick a hash ID, Git uses the latest hash ID—H—on our current branch. So now we have this:

...--F--G--H   <-- feature, master

One very interesting thing about this is that all eight commits are on both branches.

Another interesting thing is: suppose we add a new commit now. Let's call it I. Which branch name does Git update?

Your HEAD tells your Git which branch name to update

The answer to the question at the end of the last section is where a lot of this all really starts to come together. Git has a very special name, HEAD, written in all-capital letters like this.3 Normally, Git keeps HEAD attached to one of your branch names:

...--F--G--H   <-- feature (HEAD), master

This indicates that we have branch feature checked out. If we run git status, it will say on branch feature. If we git checkout master, we'll convert this to:

...--F--G--H   <-- feature, master (HEAD)

In both cases, the current commit will be commit H. But the current branch will change. We have two different names for the same commit: feature means commit H and master means commit H.

But now that we're in this slightly odd looking state, let's make a new commit or two. We'll call these I, and then J. Git will:

  • write out the tree-as-snapshot;
  • add the rest of the stuff that goes into a commit: name, email, etc., and of course the all-important parent line as well; and
  • update the current branch name.

So once we have made two new commits, we have:

             I--J   <-- master (HEAD)
            /
...--F--G--H   <-- feature

Now let's git checkout feature and make two more new commits, J and K. The first step—git checkout feature—results in this:

             I--J   <-- master
            /
...--F--G--H   <-- feature (HEAD)

We're back on commit H. Git will have changed the files in our work-tree to match commit H.4 Moreover, HEAD is now attached to feature, not to master. So now let's make commits K and L, which will update the name feature this time:

             I--J   <-- master
            /
...--F--G--H
            \
             K--L   <-- feature (HEAD)

We are now in a state where we can git merge and—the part you care about—get merge conflicts.


3On Windows and MacOS—technically, on case-folding file systems—you can often spell it in lowercase and have it work. However, this starts to break if you start using git worktree add, so it's a bad habit to fall into. If you don't like typing four uppercase letters, consider using the @ synonym for HEAD.

4Again, the index / staging-area is very important too, and there are special corner cases where Git doesn't update (some of) the index and work-tree, but let's ignore all of them for now.


How git merge works

The git merge command seems like magic, but in fact, it's not magic at all. You type in:

git checkout master

which changes your view to this:

             I--J   <-- master (HEAD)
            /
...--F--G--H
            \
             K--L   <-- feature

The current commit is now J, so what you see in your files matches the frozen J. The current branch is now master: HEAD is attached to the name master. Note that commits A through J are all on master.

Now you run:

git merge feature

The name feature identifies commit L, but commits A through H and K and L are all on feature.

Some commits—A through H—are on both branches. One of these commits is the best common / shared commit. Git calls this best-common-commit the merge base. In this case, it's pretty clear which commit is the best one: that's commit H, which comes just before the two branches diverge. We could go further back, but why bother? Obviously, everything in commit H is the same as everything in commit H.

Let's think for a moment about the goal of git merge. The goal is to combine changes. How do we get changes, when all we have is snapshots? But wait—we already know how to do that! We use git diff. We can run git diff on any two snapshots, to compare them and see what changed.

We have three snapshots here: H, J, and L. We'll need to run two git diffs. Let's do that:

  • git diff --find-renames hash-of-H hash-of-J will compare H and J, and tell us what changed on master since the common starting point H.

  • git diff --find-renames hash-of-H hash-of-L will compare H and L, and tell us what changed on feature since the common starting point H.

We do not have to type in, or even find, any of these hashes ourselves. Git does that for us. It knows the hash ID for J because that's our current commit and is in the name master, and it knows the hash ID for L because that's in the name feature. Git finds H on its own, using the commit graph—which is no longer just one simple backwards chain, but still not too complicated. If we want, we can see which merge base commit(s) Git found using:5

git merge-base --all master feature

but we don't have to bother; git merge does all the hard work here.

Anyway, having made the two diff listings,6 git merge can now look at them and figure out what to do:

  • If you changed a file since H, and they didn't, use your file.
  • If they changed a file since H, and you didn't, use their file.
  • If neither of you changed the file, use any copy of the file: all three match.

Only if both of you changed some file, does git merge have to work hard. Now git merge has to actually combine your two sets of changes. Let's say both of you touched the file README.md:

  • If you touched line 3 and they didn't, Git can use your change here.
  • If they touched line 25 and you didn't, Git can use their change here.
  • But if you both changed line 42, to two different things, Git does not know which change is right. The result is a conflict!

When there are no conflicts, everything is easy for Git: it just combines all the changes into what amounts to (but isn't quite) one big combined patch and applies that to the copy of the file from the merge base. The effect is to keep your changes and at the same time, add their changes. It's all combined and all good. Or so Git thinks, at least: what if your change on line 3 breaks their change on line 25?

If there are conflicts, though, Git leaves you with a bit of a mess. It writes all three input file versions into the index / staging-area (which we aren't going to talk about here) and writes a bunch of conflict markers into the work-tree copy of README.md. Your job becomes: fix up the mess and put the right merge into place. The merge is sort of suspended: Git has recorded that there is a merge, and git status will tell you that you're in the middle of a merge. But the git merge command has exited. You'll start a new command later to really finish the job.

You can also get what I call high level conflicts. Note the --find-renames in our sample git diff commands. If you have renamed some files, or added or deleted files, in your changes—the H-vs-J part on master—and they also renamed, added, or deleted files in their changes—the H vs L part on feature—it's possible that these whole-file changes conflicted with each other. In this case, git merge stops with a mess, leaving the files in the index as before, but often with no merge conflict markers in the work-tree copies of the files. Fortunately these high level conflicts are rare, as resolving them can be a lot harder.

Once you fix everything up, your job becomes: run git merge --continue (if your Git isn't too old) or git commit (if it is).7 Git will make a new snapshot as usual, collect a log message as usual, and write out a new commit. This new commit will have two parents.

If all goes well in the merge, Git will make the new commit on its own (collecting a log message as usual): you don't have to run git merge --continue because the merge never stopped. Either way—conflicted or not, resolved by hand or not—this is where the merge finishes, and this is the last bit of magic, because this new merge commit will have two parents:

             I--J
            /    \
...--F--G--H      M   <-- master (HEAD)
            \    /
             K--L   <-- feature

The first parent is all business as usual: M's first parent is J, the commit you were on a moment ago. The second parent is the commit you merged: L, the tip of feature. The fact that this is a merge commit is recorded in the commit graph. Commit M has two parents, J and L. A future git merge of a future feature will find a different merge base.


5The --all is for particularly complicated graphs, which we don't actually have here. This means --all won't do anything in this case, but git merge uses it, just in case we do have a complicated graph. If you get two hash IDs out of git merge-base, the merge process gets more complicated, so we'll just skip that. If you leave out the --all, git merge-base picks one of the however-many merge bases there might be at (apparent) random. But there's almost always just one merge base anyway.

6Internally, git merge doesn't make diff listings. It does run the two diffs, but in a special optimized-for-merge way. In many cases it can skip most of the file-by-file diffs entirely, and when it does need to do the actual comparing, it uses a bunch of internal data structures to find the various changed lines, rather than a textual git diff output. But the effect is the same, it's just more efficient.

7All git merge --continue does is check that there is a finished merge to commit, then run git commit. But this is a bit of a safety check, helping to make sure everything is the way you think it is, so it's a bit better to use git merge --continue even though you could just run git commit.


Merges prepare for future merges

I'm going to repeat this here because it's the source of all of your woes. As we saw above, git merge:

  1. computes a merge base;
  2. runs (in effect) two git diffs;
  3. combines the changes, applying those changes to the merge base commit;
  4. commits the result, if all goes well, or makes you clean up and commit if not.

The merge base commit found in step 1 is based on the commit graph. The commit graph in step 4 is your input to the next merge—the next "step 1".

When you repeatedly merge one branch into another, you get a sort of sewing stitch pattern:

...--o--o--o---M   <-- mainline
      \       /
       o--o--o   <-- topic

becomes:

...--o--o--o---M1----M2--P--M3   <-- mainline
      \       /     /      /
       o--o--T--o--U---o--V--o--W   <-- feature

where each M has two parents, one being a previous mainline commit (maybe even a previous merge) and the other being one of the commits that was, at the time, the tip commit of the feature or topic branch.

Consider what happens now if we git checkout mainline and then git merge feature. The name mainline identifies commit M3, which has parents P and V. The name feature identifies commit W. The merge base here is the best common commit, but which commit is that? Well, let's start at W and work backwards: we get some anonymous commit o, then V, then another anonymous commit, and so on.

If we start at M3 and work backwards, we get two commits: P and V. That's the magic of a merge commit: by having V as its second parent, it automatically includes commit V and all the earlier topic commits as part of the mainline branch. What this means is that commit V is now the merge base and the two git diff commands will:

  • compare V vs M3, to see what we changed, and
  • compare V vs W, to see what they changed.

These are the change-sets that git merge will attempt to combine. Conflicts, if there are any, occur because of overlapping changes in the two change-sets.

The content of a merge commit is up to you. The graph of a merge commit is implied by the graph at the time you ran git merge. One of the key inputs to git merge is the merge base, and Git finds this automatically, using the graph. To view the graph, see Pretty git branch graphs.

Takeaways

  • Commits are snapshots plus metadata.
  • Some of the metadata forms the commit graph. These are the parent links, which all point backwards: Git has to work backwards.
  • Branch names identify one specific commit. This one specific commit is the last commit of / in / on the branch. Git calls this the tip commit.
  • Making a new commit advances the current branch name to point to the new commit. The branch now has a new tip.
  • HEAD tells you which name is the current name, and that name then tells you which commit is the current commit—so HEAD gives Git two different pieces of information at the same time.
  • "The branch" can mean a whole series of commits, found by starting at the last commit given by the branch name and working backwards.
  • Many commits are pretty often on many branches simultaneously. (See Think Like (a) Git.)
  • Working backwards through a merge commit means following both parents.
  • git diff can compare any given two commits' snapshots.
  • git show compares a commit to its parent; so does git log -p.
  • git merge walks as much of the graph as it needs to, to find the best merge base. It then makes two diffs and combines them, for a true merge.

Not part of the above, but important:

  • Git makes new commits from the files stored in the index, not those in the work-tree.
  • Since merge commits have at least two parents, git show has to do something special here (and it does), but git log -p is lazy and just doesn't bother to do anything to show a patch. Either way, both commands leave a lot out, on purpose: "patches" for a merge are inherently kind of faulty.
  • git merge is sometimes deliberately lazy: if a true merge isn't required, it will do a fast-forward instead. When git merge does a fast-forward, it doesn't make a new commit.
  • In "detached HEAD" mode, HEAD points directly to a commit, rather than being attached to a branch name. Everything else works the same, except that asking Git the question: which branch name is the current branch comes back with error: does not compute.
  • git checkout (or in Git 2.23, the new git switch) is how you change which branch name HEAD is attached-to.
torek
  • 330,127
  • 43
  • 437
  • 552