3

I cloned a specific branch of a github repo to my local machine, made some changed, then pushed to the github website. Now I wanted to push the same change to another branch. But when I try to clone that branch, it says "fatal: destination path 'REPO' already exists and is not an empty directory."

So my next thought is, I should just make a new directory for each branch of the repo. But I am wondering, there's probably a system Github has done for this.

So I looked online and there does seem to be this.

Editing a branch in git?

But I am having a hard time following this.

First, it looks like I should just clone the whole repo, and not a specific branch from the repo.

Then the next steps seems to be

git checkout newbranch

So I am wondering what's happening when I do this. I think either one of two things happen

A: When I cloned the repo, it also cloned all of the branches onto my computer, and they are all just hidden, but when I do git checkout newbranch , then apparent files on my computer changes to that branch.

B: Only the master is on my computer, and when I do git checkout newbranch , it replaces the files on my computer with the online branch I specified.

C: Something else.

SantoshGupta7
  • 4,211
  • 4
  • 31
  • 64
  • In your case, it is best to work in a separate folder, hence my suggestion. If you need to switch branch within the same cloned repo folder, use git switch. For details... see torek's answer. – VonC Nov 17 '19 at 08:18
  • You had selected the right answer before. What did I missed? – VonC Nov 17 '19 at 19:00
  • The other one was more thorough so I thought I was supposed to mark that one, for future people who look at this post and go to the selected answer. Should I avoid answer changes? – SantoshGupta7 Nov 17 '19 at 19:12
  • It depends: did you end up using git worktree, as I recommended? – VonC Nov 17 '19 at 19:37

2 Answers2

1

You don't have to clone (again) your repository.

You can use the git worktree command, as I mentioned in "Multiple working directories with Git?"

That way, you can checkout (switch actually) another branch, in a separate folder, without cloning again the repository.

Example:

cd /path/to/my/cloned/repo

git worktree add -b emergency-fix ../temp master
pushd ../temp
# ... hack hack hack ...
git commit -a -m 'emergency fix for boss'
popd
git worktree remove ../temp
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
1

We'll get to your answer in a bit. It involves the use of what Git calls the index and the work-tree (or working tree or similar). But let's start with correcting some mis-impressions.

First, it looks like I should just clone the whole repo, and not a specific branch from the repo.

Generally, yes—but there is more, or maybe actually less, to it than that. The real problem here is the assumptions that people make about the word "branch". Git is not really about branches at all, but rather about commits. The fundamental unit of storage in Git is the commit.

Each commit holds a snapshot: a full set of files. That's the data part of a commit. But each commit also contains metadata, such as the name and email address of the person who made the commit, a date-and-time stamp for when they made the commit, and their supplied log message by which they can tell you why they made the commit. There's one more thing that's crucial to each commit, but before we get to it, we should talk about how you can name a commit.

The true name of any commit is its hash ID. Each commit has its own unique hash ID, assigned to it at the time you (or whoever) create the commit. This hash ID is a big ugly random-looking string of letters and numbers, such as 08da6496b61341ec45eac36afcc8f94242763468. While it looks random, it's actually completely dependent on the data-and-metadata inside the commit: it is literally a cryptographic checksum of the commit's content.

This means that every commit is frozen forever, once made. Its name is a checksum of its content. If you were to attempt to change something about the commit—even just one single bit in one byte somewhere inside it—you'd get a new and different commit, with a new and different hash ID. All you really accomplished is to add a new commit. The original commit remains in the repository. So a repository is a big collection of commits, that—for the most part—just keeps getting bigger over time.1

This brings us to the crucial item in each commit's metadata, and that is a list of parent hash IDs. Usually this list just has one entry—one parent hash—which is this commit's one and only parent. The parent is the commit that comes before the commit we're looking at now. Some commits—merge commits—have two or more parents, and at least one commit has no parent: the very first commit we ever make, in a new and initially empty repository, is created from nothing. There is no commit before it.

If we replace commit hash IDs with single uppercase letters, and allocate them in order starting from A, a repository with just three commits in it might have them arranged like this:

A <-B <-C

That is, commit C—the last of the three—holds the hash ID of commit B as its parent. We say that commit C points to commit B, because we can read the hash ID of B from C. We can use that to find commit B. Meanwhile commit B points to commit A, so now we can find A. Commit A has no parent—Git calls this a root commit—and we can stop walking backwards through history, having found all three commits.


1It's possible, with some difficulty, to "forget" old commits, and if a commit can no longer be found by any means, it can be removed for real. Git will garbage collect these commits, and other useless Git objects, automatically. But for a commit or other object to be collectible like this, it must truly be garbage. In general, you can achieve this by "abandoning" some commits, replacing them with new-and-improved commits. The new ones get new and different hash IDs: again, it's impossible to change any existing commits. Then we use a feature of branch names to abandon the original commits in favor of the new and improved ones. We won't show how this works here, but with some logical reasoning, you can figure it out on your own.


How branch names come into this picture

When we look at this kind of diagram—showing commits, replacing their actual hash IDs with letters or dots (●) or whatever—we can see a sort of graphical representation of the various commits in the repository:

● ←● ←● ←●

for example shows four commits (with no names for them). These sequences can form branch-like structures:

        D--E
       /
A--B--C
       \
        F--G

But however many commits we have in the repository—often very many—their real names are big ugly random-looking hash IDs. How can we tell one from the next? Especially as mere humans, for whom 08da6496b61341ec45eac36afcc8f94242763468 is alarmingly similar to 08da96b61341ec45eac36afcc86f942427634684, even though these are totally different hash IDs, there's no good way to deal with the raw hash IDs. Git can do it; we can't.

Git needs a fast way to find the last commit in a branch, and we (humans) need to have a way to talk about a branch. So Git gives us branch names:

        D--E   <-- master
       /
A--B--C
       \
        F--G   <-- dev

The name, master or dev, simply holds the actual hash ID of the last commit in the branch. The last commit points back one step, to the second-to-last commit. This commit points back another step, and so on. In this case, both branches quickly point back to commit C, which points to B which points to A (and then we stop because we've run out of parents).

Note that commits A-B-C are on both branches, in Git. The set of branches that contain a commit changes dynamically as we add and remove branch names.

To add a new commit to a branch, we use:

git checkout master   # select commit `E`

or:

git checkout dev      # select commit `G`

which fills in a work area from the given commit, and also remembers which commit and branch name we're using. I like to draw this by adding the name HEAD to the selected branch name:

        D--E   <-- master (HEAD)
       /
A--B--C
       \
        F--G   <-- dev

This tells us that we're working right now with commit E and name master.

When you make a new commit, what Git does is package up a full snapshot of all of your files, and write out a new commit: your name, your email address, the current date-and-time, and your log message all go into this new commit. The parent of this new commit is the current commit, in this case E. The new commit goes into the repository:

             H
            /
        D--E   <-- master (HEAD)
       /
A--B--C
       \
        F--G   <-- dev

and, as the last step, git commit writes the actual hash ID of new commit H into the current branch name master:

        D--E--H   <-- master (HEAD)
       /
A--B--C
       \
        F--G   <-- dev

So now the name master selects commit H. We're still on branch master because the special name HEAD is still attached to the name master.

This is why Git is about commits, but uses branch names

The above shows us how the branch names find the commits, by storing hash IDs into branch names. We say that the branch name points to the last commit in the branch. Git calls this the tip commit of the branch. The tip commit holds a full snapshot of all of your files, and also lets Git find one earlier commit, because the tip points back to its parent. Its parent holds a snapshot of all of your files, and points back one more step in history.

The history in the repository is nothing more than all the commits, as found by starting at all branch names and working backwards, one step at a time. Commits are history; history is (all, or any selected subset of) commits.

Commit snapshots are frozen

We already mentioned that everything inside every commit is frozen for all time. That includes all of the snapshot-files. Because they are frozen, it's possible for multiple commits to share files. If you didn't change a file—or in fact, if you did change it, but then changed it back—the new commit you make now can share the copy of the file from any previous commit that had it the same way.

And in fact, Git does exactly that. Every commit shares its frozen-format files with every other commit that uses the same file contents. So although Git repositories just keep growing over time, most commits are really tiny: they mostly just have one or two, or even no, new files, or new versions of files. All the old stuff gets re-used.

But by the same token, the contents of a commit are frozen. They're also in a special, read-only, compressed, Git-only format that only Git can use. This means you can't actually do any new work at all with a commit. Git must extract each commit into a work area.

The work area

Each repository has one primary work area,2 which Git calls the work-tree or working tree or some variation on this name. Since Git 2.5 (but with some nasty bugs not fixed until Git 2.15), Git has supported the option of adding additional work-trees, using git worktree add, but you may not need them, depending on how you do your work.

The work-tree is pretty simple: Git extracts all files from a commit into the work-tree, turning them back into ordinary everyday files that you can work with, and that all your computer programs can work with. When you git checkout some particular commit—as part of checking out a different branch, for instance—Git will remove files that are in your work-tree as a result of a previous checkout, and replace them with the versions of those files that come from the other commit.

Other than this, though, there's something very important to know, and that's something Git sticks in between the current commit, and your work-tree.


2The exception to this rule is a bare repository, which is simply one with no work-tree. Most server repositories that allow git push are bare, for reasons we won't cover here.


The index

Git's index is crucially important to actually using Git, but it's hard to see directly.3 The name index is kind of generic and not very useful or meaningful, so a lot of Git uses the phrase staging area instead of the word index. These two mean the same thing. Some (mostly older) parts of Git also use the word cache, which mostly also means this same thing—the index. So there are three words or phrases for this thing. But what exactly is it?

The answer to this is a bit complicated, but if we ignore the complications—mostly having to do with handling conflicted merge operations—Git's index is really pretty simple. It can be described in one phrase as where you build your next commit.

When you git checkout some particular commit, before Git updates you work-tree, Git has to update its index. The index has in it the current commit. In effect, the index holds a copy of each file from the current commit.4 This is how Git knows which files Git put into your work-tree.

So when you git checkout some other commit, Git reads through the index and removes, from your work-tree, the files that Git put there. It reads through the commit you're switching to, and copies those versions of the file into the index instead, and puts those files—decompressed and usable—into your work-tree.

This means that, with some exceptions we won't get into here, right after git checkout, the index has all the same files in it as your current commit—the commit you selected with your git checkout master or git checkout dev, for instance. Meanwhile your work-tree also has these files. So there are three copies of each file. The key differences are:

HEAD        index      work-tree
---------   ---------  ---------
README.md   README.md  README.md
    ^           ^          ^
frozen      invisible  plain text;
in commit   but over-  yours to
            writable   work with

The index copy of the file is the one that git commit will use when you make a new commit. So, if you do make changes to the work-tree copy, you will need to copy those changes into the index copy. You do this with git add:

# edit README.md
git add README.md

The git add step reads the work-tree version, shrinks it into the frozen-and-compressed format, and overwrites the index copy with the new one. Now the new README.md is ready to go into the next commit.

Note that every other file that came out of the current commit is still in the index, still in the frozen format, and still ready to go. Running git commit will make the new snapshot using that frozen file.


3To see it directly, try git ls-files --stage. Be aware that this prints out one line for every file in the index, which is usually a lot of lines!

4In fact, the index holds instead a reference to the frozen-format copy that is in the current commit. However, the index lets you overwrite this, without changing the frozen-format copy. The effect is as if the index holds a full copy of the frozen-format file, where you can overwrite the index's copy without affecting the commit itself. Unless you start getting into the stuff that --stage shows, you can just think of the index has having a frozen-format copy of each file.


Untracked files

Because your work-tree is yours to work with, Git lets you put files in it that aren't in the index. Git calls these files untracked. By definition, an untracked file is any file that is in the work-tree but not in the index. Because it is not in the index, the new commit you make won't have a copy of the file.

If you want a totally new file, that you have created in the work-tree, to be in the next commit, you must git add the file. This puts a copy of it (in the frozen format) into the index, ready to be committed. The next git commit will have the file. In the meantime, it is now in the index, so it is now tracked.

If you have an existing file that is tracked (is in the index, presumably because it got copied out of the commit) and you want the next commit to not have the file, you can use git rm on the file. This removes the file from both the index and the work-tree. Now the file simply does not exist, and won't be in the next commit. The important part of this for the next commit is that you removed the file from the index. You could leave the file in the work-tree, for instance. But remember, whatever you do here, the file is still in the existing commit. It just won't be in the next (new) commit you make.

Summary

When you use git checkout to switch from one branch name to another, Git:

  • figures out which commit you mean;
  • de-populates your index and work-tree as needed: files that are in the current commit, but aren't in the one you're about to switch to, must be removed;
  • re-fills your index and work-tree as needed: files that are in the commit you're about to switch to, but aren't in the index and work-tree, must be created;
  • updates your index and work-tree as needed: files that are in both commits, but are different in the old and new commits, must be swapped out. The wrong old one must be replaced with the right new one.

For speed and for various other useful effects, Git will check to see which files are the same in the old and new commits, and not touch them in the index and work-tree when you switch commits. This makes switching from one branch to another very fast in many cases (whenever most files are the same in the two different commits).

torek
  • 330,127
  • 43
  • 437
  • 552