Why the bloody git
is deleting file2
when master
is checked out when I already told it not to track file2
in master
. Is it a bug?
It's acting according to design. You might call the design a bug, but the design is not what I think you think it is, as revealed by your own phrase not to track file2
in master
. The state of "being tracked", for any given file, is not a function of a branch name like master
or branch2
. There is, for each file, only tracked or untracked. A file is tracked if and only if it is in the index right now.
(Because of the way git checkout
works, it is possible to extend this phrase to refer to specific commits, and branch names always refer to some specific commit. So it's not unreasonable to want to talk about a file being tracked in some branch, but if you do, you'll tend to mislead yourself.)
What the index is and does
To really nail this all down, we have to first talk about the index: what exactly is the index? To get a proper mental picture, start with the notion that all commits are read-only. Each commit contain every file that you had Git store with that commit. Those files are frozen (read-only), and stored in a special, compressed, Git-only form.
While storing the files forever is all well and good, we need some way to use and work on the files. For this, Git provides the work-tree. Files in the work-tree are read/write (well, generally; you can change specific files if you like). They have their normal everyday computer-use form, whatever that is on your computer. You can work with them to do whatever you need. But Git itself pays relatively little attention to them.
Most version control systems stop here: they have the frozen files in commits, and the work-tree files that you can work on. Git, however, inserts this special index thing in between the commit and the work-tree. For why you'd really have to ask Linus Torvalds, but we can observe that this does a bunch of things, including making git commit
and git checkout
really fast compared to those other version control systems. But it also gives us this big headache you have just run into: it provides the notion of a tracked file.
What's in the index is, normally and initially, just an un-frozen—but still compressed and Git-only format—version of the file that came out of the commit. This means that the file is perfect for Git to freeze into a new commit. So a new commit doesn't require re-compressing the file at all: it's already there, in the index. Hence, for most files, there are three active copies:
commit index work-tree
---------------- ------------------ ------------------
frozen, Git-only unfrozen, Git-only unfrozen, ordinary
You can copy any copy to any other copy, except for the commit of course, because that copy is frozen. Copying from a commit to the index is straightforward but mostly invisible, because git checkout
does that but then also copies from the index (after writing to the index) to the work-tree, so what you see is "copy file from some commit to the work-tree", not realizing there's a "to index" step in the middle.
Copying from the work-tree to the index is also straightforward: that's what git add
does. The git add
step compresses and Git-ifies the file during the git add
, so that once it's in the index, it's ready to be frozen.
Perhaps the biggest thing the index does is that the index is always the source for a new commit.1 When you run git commit
, Git simply freezes the files that are in the index, without looking at the work-tree at all, and uses those to make the new commit. The new commit then becomes the current commit, so that the index copy and the commit copy match, in the same way that the index and commit copies of each file match right after your first git checkout
of some commit.
This, then, gives us the one-line summary of what the index is: it's the set of files that you propose to put into your next commit. If you remove a file from the index, using git rm
, you're proposing that your next commit won't include that file.
1Commands like git commit -a
, that seem to commit from the work-tree, really work by adding the files to the index first, then committing. When required, they make a special temporary index, add the files to the temporary index, and commit from the temporary index. This makes it look like Git is somehow committing from the work-tree, but it's not: it's committing from an index somewhere, even if it's a special temporary one.
git checkout branch-or-commit
fills the index and the work-tree
Whenever you git checkout
a commit, Git has to extract that commit's files into both the index and the work-tree. It needs the files to go into the index so that the index will match the commit. It needs the files to go into the work-tree so that you can see them. Once these are all in place, git checkout
will update HEAD
—which is where Git keeps track of the current commit and/or current branch—as appropriate, so that the current commit is the commit you just checked out, and you're on the branch or in "detached HEAD" mode as appropriate.
But note what just happened:
git checkout
filled the index from the commit.
- The contents of the index determine which files are tracked.
This means the set of tracked files changes. If you are on master
and file2
is not in the index, then either file2
does not exist at all (so there's no question about it) or it exists in the work-tree and is therefore untracked. But as soon as you git checkout branch2
, the commit at the tip of branch2
does have file2
in it, so file2
goes into the index and Git overwrites the work-tree copy. Now the file is tracked. If you then git checkout master
, Git sees that file2
is currently tracked, but isn't in the commit you want to get to, so Git removes file2
from both the index and the work-tree.
This is the terrible danger of removing a file with git rm --cached
: it leaves a copy in the work-tree, while taking it out of the index right now. But if it's in the index right now, there is a good chance it is also in some other commit(s). If you ever check those commits out, the file goes back into the index; if you then move away from that commit, the file gets removed from both the index and the work-tree, and now it's gone.
git update-index --skip-worktree
is no help
What this does is set one of the two special control bits—the other one is --assume-unchanged
—on an index entry for some file. The index entry only exists if the file is actually in the index: removing the file from the index removes the entry, and therefore removes the control bits.
When the control bits are set, various Git commands that would compare the index copy of the file to the work-tree copy of the file will skip their comparison. This means that Git won't suggest that you use git add
to copy a work-tree update back into the index, and won't suggest that a file that is in the index, but for some reason is missing from the work-tree, is missing.
None of this affects what's actually in the index. The index copy of the file remains unchanged, and every new commit you make continues to snapshot the index copy of the file. It's just that git add -a
doesn't update the index copy (if you've changed the work-tree copy), and git status
doesn't complain that the index copy is stale (if you've changed the work-tree copy).
A side note about directories
Git never tracks a directory. Specifically, you cannot add a directory to the index. What Git does instead is that if there's some tracked file in the index that Git wants to create in the work-tree, and the file's name includes a directory that does not currently exist, Git will make the directory on its own, so that it can put the file into it.
That's mostly it, except that Git will also sometimes remove a directory once it has removed the last file from that directory. There seem to be some odd corner cases here (especially in very old versions of Git, pre-1.8) as I have had Git leave empty directories around when there was no obvious reason to do so. The git clean
command will, if requested, remove empty directories from the work-tree.