147

Is there a simple way to backup an entire git repo including all branches and tags?

Oren Hizkiya
  • 4,367
  • 1
  • 19
  • 33
Daniel Upton
  • 5,103
  • 8
  • 38
  • 62
  • 2
    I guess you are refering to a local git repos here. – Ztyx Jul 12 '12 at 13:41
  • 2
    possible duplicate of [Backup a Local Git Repository](http://stackoverflow.com/questions/2129214/backup-a-local-git-repository) – Martin Thoma Oct 07 '14 at 08:15
  • 3
    The correct answer is to do a: git clone --mirror git@example.com/your-repo.git This will copy your entire repository, notes, branches, tracking, etc. – John May 14 '18 at 19:20
  • 1
    Some web searches I ran that didn't include this question in its results: "git clone absolutely everything branches tags notes"; "git clone everything in repository"; "git clone a repo with all tags notes". – Kenny Evitt Oct 03 '18 at 14:13

13 Answers13

207
git bundle

I like that method, as it results in only one file, easier to copy around.
See ProGit: little bundle of joy.
See also "How can I email someone a git repository?", where the command

git bundle create /tmp/foo-all --all

is detailed:

git bundle will only package references that are shown by git show-ref: this includes heads, tags, and remote heads.
It is very important that the basis used be held by the destination.
It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination.


For using that bundle, you can clone it, specifying a non-existent folder (outside of any git repo):

git clone /tmp/foo-all newFolder
Robert MacLean
  • 38,077
  • 24
  • 96
  • 147
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • 17
    add --all for complete backup – sehe Apr 07 '11 at 09:03
  • 2
    This, the `git bundle` is the correct answer on my opinion, and not the accepted one. I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies. – peterh Apr 05 '16 at 09:39
  • 3
    Note that neither `git bundle` or `git clone` gets _everything_, for example the hook scripts. – Zitrax Jun 29 '16 at 07:44
  • 2
    @Zitrax Yes, it is by design. Hooks can be dangerous or include sensitive information. – VonC Jun 29 '16 at 08:50
  • Can I use `git bundle` against a remote repo? – Ryan Shillington Jan 30 '20 at 15:44
  • @RyanShillington I have always seen that command used after a clone, not for a remote repository: https://stackoverflow.com/a/34158285/6309. This is different from an archive which does get a compressed version of the files, *without* history, and can operate on remote repo: https://stackoverflow.com/a/13751126/6309 – VonC Jan 30 '20 at 20:41
71

Whats about just make a clone of it?

git clone --mirror other/repo.git

Every repository is a backup of its remote.

Kenny Evitt
  • 8,023
  • 5
  • 59
  • 84
KingCrunch
  • 119,075
  • 18
  • 142
  • 167
  • 8
    @Daniel: If you clone a repository, you fetch every branch, but only the default one is checkouted. Try `git branch -a`. Maybe its more obvious this way: After cloning a repository you dont fetch every branch, you fetch every commit. Branches only reference to an existing commit. – KingCrunch Apr 07 '11 at 12:14
  • 1
    I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption. – peterh Apr 05 '16 at 09:38
  • @peterh Sure, but `git clone` covers all that. (1) is optional, not a requirement. If the result is still optimized, it's still a backup (2) is already covered by git itself. -- The point I'd like to give is, that if `git clone` already cover the relevant points, for what you need a different tool? Although I also prefer `git bundle` I don't think my answer is wrong, or invalid. You can see both approaches as hot- vs cold-backup. – KingCrunch Apr 07 '16 at 07:01
  • what about file permissions? does git clone necessarily copy those over? depends on the options i believe – antirealm Jan 15 '18 at 06:16
25

Expanding on some other answers, this is what I do:

Setup the repo: git clone --mirror user@server:/url-to-repo.git

Then when you want to refresh the backup: git remote update from the clone location.

This backs up all branches and tags, including new ones that get added later, although it's worth noting that branches that get deleted do not get deleted from the clone (which for a backup may be a good thing).

This is atomic so doesn't have the problems that a simple copy would.

See http://www.garron.me/en/bits/backup-git-bare-repo.html

fantabolous
  • 15,954
  • 5
  • 46
  • 45
25

Expanding on the great answers by KingCrunch and VonC

I combined them both:

git clone --mirror git@some.origin/reponame reponame.git
cd reponame.git
git bundle create reponame.bundle --all

After that you have a file called reponame.bundle that can be easily copied around. You can then create a new normal git repository from that using git clone reponame.bundle reponame.

Note that git bundle only copies commits that lead to some reference (branch or tag) in the repository. So tangling commits are not stored to the bundle.

Kimmo Ahokas
  • 351
  • 3
  • 6
6

This thread was very helpful to get some insights how backups of git repos could be done. I think it still lacks some hints, information or conclusion to find the "correct way" (tm) for oneself. Therefore sharing my thoughts here to help others and put them up for discussions to enhance them. Thanks.

So starting with picking-up the original question:

  • Goal is to get as close as possible to a "full" backup of a git repository.

Then enriching it with the typical wishes and specifiying some presettings:

  • Backup via a "hot-copy" is preferred to avoid service downtime.
  • Shortcomings of git will be worked around by additional commands.
  • A script should do the backup to combine the multiple steps for a single backup and to avoid human mistakes (typos, etc.).
  • Additionally a script should do the restore to adapt the dump to the target machine, e.g. even the configuration of the original machine may have changed since the backup.
  • Environment is a git server on a Linux machine with a file system that supports hardlinks.

1. What is a "full" git repo backup?

The point of view differs on what a "100%" backup is. Here are two typical ones.

#1 Developer's point of view

  • Content
  • References

git is a developer tool and supports this point of view via git clone --mirror and git bundle --all.

#2 Admin's point of view

  • Content files
    • Special case "packfile": git combines and compacts objects into packfiles during garbage collection (see git gc)
  • git configuration
  • Optional: OS configuration (file system permissions, etc.)

git is a developer tool and leaves this to the admin. Backup of the git configuration and OS configuration should be seen as separated from the backup of the content.

2. Techniques

  • "Cold-Copy"
    • Stop the service to have exclusive access to its files. Downtime!
  • "Hot-Copy"
    • Service provides a fixed state for backup purposes. On-going changes do not affect that state.

3. Other topics to think about

Most of them are generic for backups.

  • Is there enough space to hold the full backups? How many generations will be stored?
  • Is an incremental approach wanted? How many generations will be stored and when to create a full backup again?
  • How to verify that a backup is not corrupted after creation or over time?
  • Does the file system support hardlinks?
  • Put backup into a single archive file or use directory structure?

4. What git provides to backup content

  • git gc --auto

    • docs: man git-gc
    • Cleans up and compacts a repository.
  • git bundle --all

    • docs: man git-bundle, man git-rev-list
    • Atomic = "Hot-Copy"
    • Bundles are dump files and can be directly used with git (verify, clone, etc.).
    • Supports incremental extraction.
    • Verifiable via git bundle verify.
  • git clone --mirror

    • docs: man git-clone, man git-fsck, What's the difference between git clone --mirror and git clone --bare
    • Atomic = "Hot-Copy"
    • Mirrors are real git repositories.
    • Primary intention of this command is to build a full active mirror, that periodically fetches updates from the original repository.
    • Supports hardlinks for mirrors on same file system to avoid wasting space.
    • Verifiable via git fsck.
    • Mirrors can be used as a basis for a full file backup script.

5. Cold-Copy

A cold-copy backup can always do a full file backup: deny all accesses to the git repos, do backup and allow accesses again.

  • Possible Issues
    • May not be easy - or even possible - to deny all accesses, e.g. shared access via file system.
    • Even if the repo is on a client-only machine with a single user, then the user still may commit something during an automated backup run :(
    • Downtime may not be acceptable on server and doing a backup of multiple huge repos can take a long time.
  • Ideas for Mitigation:
    • Prevent direct repo access via file system in general, even if clients are on the same machine.
    • For SSH/HTTP access use git authorization managers (e.g. gitolite) to dynamically manage access or modify authentication files in a scripted way.
    • Backup repos one-by-one to reduce downtime for each repo. Deny one repo, do backup and allow access again, then continue with the next repo.
    • Have planned maintenance schedule to avoid upset of developers.
    • Only backup when repository has changed. Maybe very hard to implement, e.g. list of objects plus having packfiles in mind, checksums of config and hooks, etc.

6. Hot-Copy

File backups cannot be done with active repos due to risk of corrupted data by on-going commits. A hot-copy provides a fixed state of an active repository for backup purposes. On-going commits do not affect that copy. As listed above git's clone and bundle functionalities support this, but for a "100% admin" backup several things have to be done via additional commands.

"100% admin" hot-copy backup

  • Option 1: use git bundle --all to create full/incremental dump files of content and copy/backup configuration files separately.
  • Option 2: use git clone --mirror, handle and copy configuration separately, then do full file backup of mirror.
    • Notes:
    • A mirror is a new repository, that is populated with the current git template on creation.
    • Clean up configuration files and directories, then copy configuration files from original source repository.
    • Backup script may also apply OS configuration like file permissions on the mirror.
    • Use a filesystem that supports hardlinks and create the mirror on the same filesystem as the source repository to gain speed and reduce space consumption during backup.

7. Restore

  • Check and adopt git configuration to target machine and latest "way of doing" philosophy.
  • Check and adopt OS configuration to target machine and latest "way of doing" philosophy.
Maddes
  • 71
  • 1
  • 3
5

use git bundle, or clone

copying the git directory is not a good solution because it is not atomic. If you have a large repository that takes a long time to copy and someone pushes to your repository, it will affect your back up. Cloning or making a bundle will not have this problem.

Sunil Khiatani
  • 327
  • 4
  • 12
5

The correct answer IMO is git clone --mirror. This will fully backup your repo.

Git clone mirror will clone the entire repository, notes, heads, refs, etc. and is typically used to copy an entire repository to a new git server. This will pull down an all branches and everything, the entire repository.

git clone --mirror git@example.com/your-repo.git
  • Normally cloning a repo does not include all branches, only Master.

  • Copying the repo folder will only "copy" the branches that have been pulled in...so by default that is Master branch only or other branches you have checked-out previously.

  • The Git bundle command is also not what you want: "The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email to someone or put on a flash drive, then unbundle into another repository." (From What's the difference between git clone --mirror and git clone --bare)

John
  • 760
  • 10
  • 18
  • Does git clone --mirror create a consistent point-in-time backup? What is a user pushes a commit during the backup? Is it rejected, queued, or incorporated into the backup? – Benjamin Goodacre Jan 22 '19 at 06:44
4

Everything is contained in the .git directory. Just back that up along with your project as you would any file.

Oren Hizkiya
  • 4,367
  • 1
  • 19
  • 33
  • 2
    Does this mean, just backing up ALL contents of the directory containing the Git project is sufficient? – Ravindranath Akila Jun 24 '13 at 07:33
  • 1
    Agreed with Sunil--this does not appear to be an atomic operation. – jia103 Sep 09 '14 at 12:58
  • 1
    And how do you ensure no changes are made to files in that directory while creating the backup? – Raedwald Oct 10 '15 at 15:42
  • As Raedwald hinted, this method can result in an inconsistent backup and hence lead to data loss. Hence this answer should be removed, or at the very least, warn about the possibility of data loss. – Abhishek Anand Feb 21 '16 at 13:23
  • I think he knows the `copy` or `cp` commands very well and it doesn't suit his needs. And I also think, he thinks on a bare repository (although it can be copied as well, I think it is not a full-featured backup). – peterh Apr 05 '16 at 09:41
  • Useful with e.g. the ubuntu backup tool which asks for folders to copy. I have it at daily so if a corruption would occur I lose at most one day of work. – Adversus Nov 21 '17 at 10:18
3

You can backup the git repo with git-copy at minimum storage size.

git copy /path/to/project /backup/project.repo.backup

Then you can restore your project with git clone

git clone /backup/project.repo.backup project
Quanlong
  • 19,782
  • 10
  • 62
  • 75
  • 2
    https://github.com/cybertk/git-copy/blob/master/bin/git-copy#L8-L36: that seems a lot of work for a simple `git clone --bare` + `git push --force`. – VonC Jun 03 '15 at 10:17
  • @VonC Yes, but it can have some additional feature during the repackaging, or it can mine the internal structure of the git repo, which it can use for some optimization (restructuring of the destination, or speed increase, etc). – peterh Apr 05 '16 at 09:46
0
cd /path/to/backupdir/
git clone /path/to/repo
cd /path/to/repo
git remote add backup /path/to/backupdir
git push --set-upstream backup master

this creates a backup and makes the setup, so that you can do a git push to update your backup, what is probably what you want to do. Just make sure, that /path/to/backupdir and /path/to/repo are at least different hard drives, otherwise it doesn't make that much sense to do that.

Arne
  • 7,206
  • 4
  • 41
  • 60
  • I think he knows the clone command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies. – peterh Apr 05 '16 at 09:44
0

Here are two options:

  1. You can directly take a tar of the git repo directory as it has the whole bare contents of the repo on server. There is a slight possibility that somebody may be working on repo while taking backup.

  2. The following command will give you the bare clone of repo (just like it is in server), then you can take a tar of the location where you have cloned without any issue.

    git clone --bare {your backup local repo} {new location where you want to clone}
    
KyleMit
  • 45,382
  • 53
  • 367
  • 544
  • I think he knows the clone or tar command well, if he can ask such a question, and it is clearly not enough for him (because it is a clone, and not a dump). Dumps are different things as simple copies, for example: 1) they are not needed to be optimal (or even capable) for normal work 2) but they are required to have a good resistance and repairibility against data corruption 3) It is often useful if they are easily diff-able for incremental backups, while it is a not-a-goal on copies. – peterh Apr 05 '16 at 09:46
  • 3
    peterh, Definitely he wasn't asking for tar or clone command. If you look closely, i wasn't explaining those command either. What i was trying to explain is the Git backup via different method which may include various Linux commands which doesn't mean that i am teaching those linux commands. I am trying to put few ideas here. – vishal sahasrabuddhe Apr 11 '16 at 10:07
0

If it is on Github, Navigate to bitbucket and use "import repository" method to import your github repo as a private repo.

If it is in bitbucket, Do the otherway around.

It's a full backup but stays in the cloud which is my ideal method.

Mohammad
  • 31
  • 1
  • 2
-7

As far as i know you can just make a copy of the directory your repo is in, that's it!

cp -r project project-backup
Richard Tuin
  • 4,263
  • 2
  • 17
  • 17
  • Can anybody please confirm this? I feel this is the right approach for making a proper backup. – Ravindranath Akila Jun 24 '13 at 07:33
  • 5
    I think you could end up with an inconsistent snapshot when during the copy operation changes are committed/pushed to the repository. Using git commands like `git clone --bare` will give you a consistent snapshot. – Eelke Jul 18 '13 at 10:15
  • 1
    Agreed with Sunil--this does not appear to be atomic. – jia103 Sep 09 '14 at 12:58
  • 1
    @jia103 It is not always a problem if it is not atomic - you only need to know, and need to be able, to guarantee that nobody other can reach the repo while you are working on it. But I think the OP wants a specific, for git repos optimized tool for the task, simple file copy is probably well known for him. – peterh Apr 05 '16 at 09:48