You're starting from a fairly bad place in your question, or maybe chose a particularly unlucky example, as shown by #3 here:
How does the 'merge' process work ? (e.g. : it applies patches in a certain order etc)
because git merge
does not (quite) apply patches. But you want to know what makes a conflict occur, so you need to understand a lot of things about Git.
Commits are snapshots, plus a bit more
First, let's look as briefly as possible at commits. You already know that to make new commits, you git checkout
a branch, do some work on it, git add
, and git commit
. You may or may not know, though, that each commit holds a full and complete snapshot of all of your files.
This might seem odd, given that git show
shows a commit as a patch or change-set, and git log -p
shows each a patch for each commit.1 But that's because a commit doesn't just store a snapshot. If you run git log
, with or without -p
, you get more information for each commit. For instance:
$ git log | head -7 | sed 's/@/ /'
commit 7c20df84bd21ec0215358381844274fa10515017
Author: Junio C Hamano <gitster pobox.com>
Date: Fri Aug 2 13:12:24 2019 -0700
Git 2.23-rc1
Signed-off-by: Junio C Hamano <gitster pobox.com>
So a commit stores not only the snapshot, but also some name and email address stuff and so on.
Note the big ugly hash ID, 7c20df84bd21ec0215358381844274fa10515017
. This is, in effect, the true name of the commit. This one particular commit will always be 7c20df84bd21ec0215358381844274fa10515017
. It will be 7c20df84bd21ec0215358381844274fa10515017
in my clone of this Git repo, and in your clone of this same repo (https://github.com/git/git), and in the GitHub clone, and in the Git peoples' clones, and so on. We can also look directly at the raw contents of this commit:
$ git cat-file -p 7c20df84bd21ec0215358381844274fa10515017 | sed 's/@/ /'
tree 8858576e734aa4f1cd9b45e207e7ee2937488d13
parent 14fe4af084071803ab4f16e6841ff64ba7351071
author Junio C Hamano <gitster pobox.com> 1564776744 -0700
committer Junio C Hamano <gitster pobox.com> 1564776744 -0700
Git 2.23-rc1
Signed-off-by: Junio C Hamano <gitster pobox.com>
That's actually the entire internal Git commit object, right there: the snapshot is held indirectly, through the first line that says tree
and has another big ugly hash ID. The rest of the lines—parent
, author
, committer
, and the log message that Junio Hamano typed in when he made this commit, make up the rest of this commit.
Note closely the parent
line. This has another big ugly hash ID. You can look at that commit directly if you like: clone the repository and git cat-file -p
that hash ID. You'll see that this one has another tree
line—that's this other commit's snapshot—and more parent
and author
and so on lines. This next commit—which is really the previous commit—actually has two parent
lines, because commit 14fe4af084071803ab4f16e6841ff64ba7351071
is a merge commit.
These various parent
lines string commits together, by their hash IDs, in backwards order. Each commit has a true-name hash ID, and each commit has some number of parent
lines.2 Most commits have exactly one of these lines. That one line give the hash ID—the true name—of the commit's parent commit.
Once a commit is made, it is frozen forever. So it's impossible to reach back up into the parent and add the hash ID of the child, when you later make a new child commit. That's why each commit only knows its parents: they exist at the time the commit is born, and as soon as the commit is born, it's frozen for all time. The commit gets its hash ID by being born, and the uniqueness of the hash ID is determined in part by the time, down to the second, that you make the commit (encoded into those author
and committer
lines—the date and time stamp on the one shown above is 1564776744 -0700
, for both author and committer).
Note that commits, and their snapshots, are frozen forever. We can't get any work done with frozen stuff! So Git gives us a work area—a place that Git calls a work-tree or working tree or anything along these lines—where it expands out the frozen (and compressed) files from a commit. There's also a very important thing called the index or staging area (two names for the same thing) that I won't cover here, which sits "between" the commit you checked out, and the work-tree.
1Note that git log -p
doesn't show a patch for merge commits, but git show
does. There's a lot more to know here but for brevity we'll skip all of this.
2At least one commit has no parent, as we'll see in a moment. Most have one. Some—merge commits—have two or more; any commit with at least two parents is by definition a merge commit. More-than-two is rare and never, in a sense, necessary, though if you look through the Git repository for Git, you will find some. For instance, 89e4fcb0dd01b42e82b8f27f9a575111a26844df
is one such.
Commits therefore form a particular kind of graph
Mathematically, a graph is defined by a pair of sets: G = (V, E) (see Wikipedia Article). In this case V—the set of vertices in the graph—is all of your commits, as found by their hash IDs, and E—the set of edges—comes from the parent
lines. For the very simplest cases, though, we can just draw the graph, which I think is a lot more comprehensible. Let's use one-letter names for commits, to stand in for the big ugly hash IDs, and imagine we have a tiny repository with just three commits, all in a row:
A <-B <-C
Commit C
is the last one we made. It remembers the hash ID for commit B
, so B
is C
's parent. Meanwhile, B
remembers the hash ID for commit A
: A
is B
's parent. But commit A
is the very first one we ever made, so it has no parent. In Git terms, it is a root commit. It has no parent because it can't have any parents: there were no commits before A
existed.
Let's make a new commit now. It will get some random-looking hash ID, but we'll just call it D
. The parent of D
needs to be the commit that comes before D
. But that's just commit C
, of course. So D
's parent will be C
. The snapshot will be whatever we want it to be. The author and committer will be us, with "now" as the time-stamp, and we get to write up a log message. Git takes all of that stuff—the tree, the parent
being C
, our name and email and the times, and our log message—and writes them out as a new commit, acquiring some hash ID that we'll pretend is just D
, and now we have:
A <-B <-C <-D
git show
and git log -p
use the graph to compare snapshots
Git can can compare the snapshot in C
to the snapshot in D
. If we do that, we'll see what we changed. That's what git log
and git show
do: given some commit, they look at the parent of that commit as well as at that commit. Whatever is different, that's what they show as your patch.
You can also use git diff
to compare any two commits. For instance, you can compare the first commit ever, A
, to the last one here, D
, using git diff hash-of-A hash-of-D
. Git extracts both snapshots, compares them, and tells you what's different.
Branch names find commits
So far, none of this is hard at all. Each new commit gets some big ugly random-looking hash ID. Each commit points back to its parent. No problem, eh? But wait: How do we remember the actual big ugly hash ID of the last commit? We need a place to stash that hash ID, because in a big repository we won't be able to just glance at every commit and all of their parent
lines and so on and figure it out. So what Git does is this: it saves the hash ID of the last commit—C
, and then D
—in a name. Let's use the name master
:
A--B--C--D <-- master
The name master
, in this case, just holds the actual raw hash ID of commit D
—the one we just made. From D
, Git can use its parent line to find C
, and then use C
's parent to find B
, and so on. The action stops when Git finds A
, which has no parent.
So a branch name just identifies the last commit in a branch. If we make new commit E
, Git updates master
by writing E
's actual hash ID into the name master
:
A--B--C--D--E <-- master
and now we have five commits on master
. We can keep going and eventually we have eight commits, all on master
, like this:
...--F--G--H <-- master
It's still easy, isn't it? Let's make it a little harder. :-) Let's create a new branch name, feature
. How exactly do we do that? Well, we ask Git to do it using git branch
or git checkout
. Now, just like with master
, Git has to store some hash ID into this new name. Which hash ID should it use? Git requires the hash ID of some existing commit.
Any of our eight commits, A
through H
, will do. We can pick one, but if we don't pick a hash ID, Git uses the latest hash ID—H
—on our current branch. So now we have this:
...--F--G--H <-- feature, master
One very interesting thing about this is that all eight commits are on both branches.
Another interesting thing is: suppose we add a new commit now. Let's call it I
. Which branch name does Git update?
Your HEAD tells your Git which branch name to update
The answer to the question at the end of the last section is where a lot of this all really starts to come together. Git has a very special name, HEAD
, written in all-capital letters like this.3 Normally, Git keeps HEAD
attached to one of your branch names:
...--F--G--H <-- feature (HEAD), master
This indicates that we have branch feature
checked out. If we run git status
, it will say on branch feature
. If we git checkout master
, we'll convert this to:
...--F--G--H <-- feature, master (HEAD)
In both cases, the current commit will be commit H
. But the current branch will change. We have two different names for the same commit: feature
means commit H
and master
means commit H
.
But now that we're in this slightly odd looking state, let's make a new commit or two. We'll call these I
, and then J
. Git will:
- write out the tree-as-snapshot;
- add the rest of the stuff that goes into a commit: name, email, etc., and of course the all-important
parent
line as well; and
- update the current branch name.
So once we have made two new commits, we have:
I--J <-- master (HEAD)
/
...--F--G--H <-- feature
Now let's git checkout feature
and make two more new commits, J
and K
. The first step—git checkout feature
—results in this:
I--J <-- master
/
...--F--G--H <-- feature (HEAD)
We're back on commit H
. Git will have changed the files in our work-tree to match commit H
.4 Moreover, HEAD
is now attached to feature
, not to master
. So now let's make commits K
and L
, which will update the name feature
this time:
I--J <-- master
/
...--F--G--H
\
K--L <-- feature (HEAD)
We are now in a state where we can git merge
and—the part you care about—get merge conflicts.
3On Windows and MacOS—technically, on case-folding file systems—you can often spell it in lowercase and have it work. However, this starts to break if you start using git worktree add
, so it's a bad habit to fall into. If you don't like typing four uppercase letters, consider using the @
synonym for HEAD
.
4Again, the index / staging-area is very important too, and there are special corner cases where Git doesn't update (some of) the index and work-tree, but let's ignore all of them for now.
How git merge
works
The git merge
command seems like magic, but in fact, it's not magic at all. You type in:
git checkout master
which changes your view to this:
I--J <-- master (HEAD)
/
...--F--G--H
\
K--L <-- feature
The current commit is now J
, so what you see in your files matches the frozen J
. The current branch is now master
: HEAD
is attached to the name master
. Note that commits A
through J
are all on master
.
Now you run:
git merge feature
The name feature
identifies commit L
, but commits A
through H
and K
and L
are all on feature
.
Some commits—A
through H
—are on both branches. One of these commits is the best common / shared commit. Git calls this best-common-commit the merge base. In this case, it's pretty clear which commit is the best one: that's commit H
, which comes just before the two branches diverge. We could go further back, but why bother? Obviously, everything in commit H
is the same as everything in commit H
.
Let's think for a moment about the goal of git merge
. The goal is to combine changes. How do we get changes, when all we have is snapshots? But wait—we already know how to do that! We use git diff
. We can run git diff
on any two snapshots, to compare them and see what changed.
We have three snapshots here: H
, J
, and L
. We'll need to run two git diff
s. Let's do that:
git diff --find-renames hash-of-H hash-of-J
will compare H
and J
, and tell us what changed on master
since the common starting point H
.
git diff --find-renames hash-of-H hash-of-L
will compare H
and L
, and tell us what changed on feature
since the common starting point H
.
We do not have to type in, or even find, any of these hashes ourselves. Git does that for us. It knows the hash ID for J
because that's our current commit and is in the name master
, and it knows the hash ID for L
because that's in the name feature
. Git finds H
on its own, using the commit graph—which is no longer just one simple backwards chain, but still not too complicated. If we want, we can see which merge base commit(s) Git found using:5
git merge-base --all master feature
but we don't have to bother; git merge
does all the hard work here.
Anyway, having made the two diff listings,6 git merge
can now look at them and figure out what to do:
- If you changed a file since
H
, and they didn't, use your file.
- If they changed a file since
H
, and you didn't, use their file.
- If neither of you changed the file, use any copy of the file: all three match.
Only if both of you changed some file, does git merge
have to work hard. Now git merge
has to actually combine your two sets of changes. Let's say both of you touched the file README.md
:
- If you touched line 3 and they didn't, Git can use your change here.
- If they touched line 25 and you didn't, Git can use their change here.
- But if you both changed line 42, to two different things, Git does not know which change is right. The result is a conflict!
When there are no conflicts, everything is easy for Git: it just combines all the changes into what amounts to (but isn't quite) one big combined patch and applies that to the copy of the file from the merge base. The effect is to keep your changes and at the same time, add their changes. It's all combined and all good. Or so Git thinks, at least: what if your change on line 3 breaks their change on line 25?
If there are conflicts, though, Git leaves you with a bit of a mess. It writes all three input file versions into the index / staging-area (which we aren't going to talk about here) and writes a bunch of conflict markers into the work-tree copy of README.md
. Your job becomes: fix up the mess and put the right merge into place. The merge is sort of suspended: Git has recorded that there is a merge, and git status
will tell you that you're in the middle of a merge. But the git merge
command has exited. You'll start a new command later to really finish the job.
You can also get what I call high level conflicts. Note the --find-renames
in our sample git diff
commands. If you have renamed some files, or added or deleted files, in your changes—the H
-vs-J
part on master
—and they also renamed, added, or deleted files in their changes—the H
vs L
part on feature
—it's possible that these whole-file changes conflicted with each other. In this case, git merge
stops with a mess, leaving the files in the index as before, but often with no merge conflict markers in the work-tree copies of the files. Fortunately these high level conflicts are rare, as resolving them can be a lot harder.
Once you fix everything up, your job becomes: run git merge --continue
(if your Git isn't too old) or git commit
(if it is).7 Git will make a new snapshot as usual, collect a log message as usual, and write out a new commit. This new commit will have two parents.
If all goes well in the merge, Git will make the new commit on its own (collecting a log message as usual): you don't have to run git merge --continue
because the merge never stopped. Either way—conflicted or not, resolved by hand or not—this is where the merge finishes, and this is the last bit of magic, because this new merge commit will have two parents:
I--J
/ \
...--F--G--H M <-- master (HEAD)
\ /
K--L <-- feature
The first parent is all business as usual: M
's first parent is J
, the commit you were on a moment ago. The second parent is the commit you merged: L
, the tip of feature
. The fact that this is a merge commit is recorded in the commit graph. Commit M
has two parents, J
and L
. A future git merge
of a future feature
will find a different merge base.
5The --all
is for particularly complicated graphs, which we don't actually have here. This means --all
won't do anything in this case, but git merge
uses it, just in case we do have a complicated graph. If you get two hash IDs out of git merge-base
, the merge process gets more complicated, so we'll just skip that. If you leave out the --all
, git merge-base
picks one of the however-many merge bases there might be at (apparent) random. But there's almost always just one merge base anyway.
6Internally, git merge
doesn't make diff listings. It does run the two diffs, but in a special optimized-for-merge way. In many cases it can skip most of the file-by-file diffs entirely, and when it does need to do the actual comparing, it uses a bunch of internal data structures to find the various changed lines, rather than a textual git diff
output. But the effect is the same, it's just more efficient.
7All git merge --continue
does is check that there is a finished merge to commit, then run git commit
. But this is a bit of a safety check, helping to make sure everything is the way you think it is, so it's a bit better to use git merge --continue
even though you could just run git commit
.
Merges prepare for future merges
I'm going to repeat this here because it's the source of all of your woes. As we saw above, git merge
:
- computes a merge base;
- runs (in effect) two
git diff
s;
- combines the changes, applying those changes to the merge base commit;
- commits the result, if all goes well, or makes you clean up and commit if not.
The merge base commit found in step 1 is based on the commit graph. The commit graph in step 4 is your input to the next merge—the next "step 1".
When you repeatedly merge one branch into another, you get a sort of sewing stitch pattern:
...--o--o--o---M <-- mainline
\ /
o--o--o <-- topic
becomes:
...--o--o--o---M1----M2--P--M3 <-- mainline
\ / / /
o--o--T--o--U---o--V--o--W <-- feature
where each M
has two parents, one being a previous mainline commit (maybe even a previous merge) and the other being one of the commits that was, at the time, the tip commit of the feature or topic branch.
Consider what happens now if we git checkout mainline
and then git merge feature
. The name mainline
identifies commit M3
, which has parents P
and V
. The name feature
identifies commit W
. The merge base here is the best common commit, but which commit is that? Well, let's start at W
and work backwards: we get some anonymous commit o
, then V
, then another anonymous commit, and so on.
If we start at M3
and work backwards, we get two commits: P
and V
. That's the magic of a merge commit: by having V
as its second parent, it automatically includes commit V
and all the earlier topic
commits as part of the mainline
branch. What this means is that commit V
is now the merge base and the two git diff
commands will:
- compare
V
vs M3
, to see what we changed, and
- compare
V
vs W
, to see what they changed.
These are the change-sets that git merge
will attempt to combine. Conflicts, if there are any, occur because of overlapping changes in the two change-sets.
The content of a merge commit is up to you. The graph of a merge commit is implied by the graph at the time you ran git merge
. One of the key inputs to git merge
is the merge base, and Git finds this automatically, using the graph. To view the graph, see Pretty git branch graphs.
Takeaways
- Commits are snapshots plus metadata.
- Some of the metadata forms the commit graph. These are the parent links, which all point backwards: Git has to work backwards.
- Branch names identify one specific commit. This one specific commit is the last commit of / in / on the branch. Git calls this the tip commit.
- Making a new commit advances the current branch name to point to the new commit. The branch now has a new tip.
HEAD
tells you which name is the current name, and that name then tells you which commit is the current commit—so HEAD
gives Git two different pieces of information at the same time.
- "The branch" can mean a whole series of commits, found by starting at the last commit given by the branch name and working backwards.
- Many commits are pretty often on many branches simultaneously. (See Think Like (a) Git.)
- Working backwards through a merge commit means following both parents.
git diff
can compare any given two commits' snapshots.
git show
compares a commit to its parent; so does git log -p
.
git merge
walks as much of the graph as it needs to, to find the best merge base. It then makes two diffs and combines them, for a true merge.
Not part of the above, but important:
- Git makes new commits from the files stored in the index, not those in the work-tree.
- Since merge commits have at least two parents,
git show
has to do something special here (and it does), but git log -p
is lazy and just doesn't bother to do anything to show a patch. Either way, both commands leave a lot out, on purpose: "patches" for a merge are inherently kind of faulty.
git merge
is sometimes deliberately lazy: if a true merge isn't required, it will do a fast-forward instead. When git merge
does a fast-forward, it doesn't make a new commit.
- In "detached HEAD" mode,
HEAD
points directly to a commit, rather than being attached to a branch name. Everything else works the same, except that asking Git the question: which branch name is the current branch comes back with error: does not compute.
git checkout
(or in Git 2.23, the new git switch
) is how you change which branch name HEAD
is attached-to.