33

I have a GIT repository on BitBucket which is more than 4GB.

I can't clone the repository using the normal GIT command as it fails (looks like it's working for a long time but then rolls back).
I also can't download the repository as a zip from the BitBucket interface as:

Feature unavailable This repository is too large for us to generate a download.

Is there any way to download a GIT repository incrementally?

VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
Sebastian Gray
  • 2,490
  • 3
  • 26
  • 36
  • I just checked the BitBucket page for my repos, to make sure: They don't seem to offer a "Download zip" option like e.g. GitHub does... – Nicolas Miari Dec 21 '15 at 05:30
  • 1
    Your repository is too large. Do you have a lot of large binary files being versioned? Even if you _could_ download it, 4GB is too large to be user friendly. – Tim Biegeleisen Dec 21 '15 at 05:31
  • What specifically are you trying to achieve? Do you just need a copy of the latest version of the files? Do you need the full history, including presumably the large binary files that have caused this? – Steve Bennett Dec 21 '15 at 05:56
  • I don't need the full history but I need a full copy of all of my files. – Sebastian Gray Dec 21 '15 at 06:19
  • @TimBiegeleisen Yes the repository is large. It's usable when it was working but once we hit the imposed limits it stopped working. I now understand the inherent GIT limitations and that we should use Perforce or Subversion instead - however I still need to get a copy of the latest files in the repo. – Sebastian Gray Dec 21 '15 at 06:29
  • @SebastianGray: How did you get your repo that big anyway? I thought BitBucket automatically denies you from making big commits like that – Puddler Dec 21 '15 at 11:44
  • @Puddler It sort of just grew. I think it was because we setup this repository a number of years ago and they let us keep going. I finally got a copy of the repository overnight - it ended up being 10.8 GB on my disk :-) – Sebastian Gray Dec 23 '15 at 00:41
  • @SebastianGray - Good to hear you've got it in the end – Puddler Dec 23 '15 at 00:50

7 Answers7

23

If you don't need to pull the whole history you could specify the number of revisions to clone

git clone <repo_url> --depth=1

Of course this might not help if you have a particularly large file in your repository

Puddler
  • 1,845
  • 1
  • 14
  • 21
  • Does this get all of the files though, or just those changed in the last check in? – Sebastian Gray Dec 21 '15 at 06:21
  • All the files from the master branch as of last commit. You can pull another branch if you don't want master but you need to choose the right flags https://git-scm.com/docs/git-clone – Puddler Dec 21 '15 at 10:49
  • Does this also mean that I will be only able to work on one branch and any new changes stashed will move along with every new branch created? @Puddler – Mansi Mar 09 '20 at 17:52
  • 1
    @Mansi Sorry I had not logged into StackOverflow in a long while. But to answer your question, the clone will get you exactly one commit from your selected branch. All the files will appear as it did at that commit, if you do further commits on top they will track as normal for that branch. You can do a pull after the clone to get other commits from other branches or older commits from the branch you cloned from. – Puddler Apr 15 '21 at 13:25
16

For me, helped perfectly, like is described in this answer: https://stackoverflow.com/a/22317479/6332374, but with one little improvement, because of big repo:

At first:

git config --global core.compression 0

then, clone just a part of your repo:

git clone --depth 1 <repo_URI>

and now "the rest"

git fetch --unshallow

but here is the trick.: When you have a big repo sometimes you must perform that step multiple times. So... again,

git fetch --unshallow

and so on.

Try multiple times. Probably you will see, that each time you perform 'unshallow' you get more and more objects before the error.

And at the end, just to be sure.

git pull --all

Community
  • 1
  • 1
8

1) you can initially download the single branch having only the latest commit revision (depth=1), this will significantly reduce the size of the repo to download and still let you work on the code base:

git clone --depth <Number> <repository> --branch <branch name> --single-branch

example:
git clone --depth 1 https://github.com/dundermifflin/dwightsecrets.git --branch scranton --single-branch


2) later you can get all the commits (after this your repo will be in the same state as after a git clone):

git fetch --unshallow

or if it's still too much, get only last 25 commits:

git fetch --depth=25


Other way: git clone is not resumable but you can first git clone on a third party server and then download the complete repo over http/ftp which is actually resumable.

GorvGoyl
  • 27,835
  • 20
  • 141
  • 143
4

One potential technique is just to clone a single branch. You can then pull in more later. Do git clone [url_of_remote] --branch [branch_name] --single-branch.

Large repositories seem to be a major weakness with git. You can read about that at http://www.sitepoint.com/managing-huge-repositories-with-git/. This article mentions a git extension called git-annex that can help with large files. Check it out at https://git-annex.branchable.com/. It helps by allowing git to manage files without checking the files into git. Disclaimer, I've never tried it myself.

Some of the solutions at How do I clone a large Git repository on an unreliable connection? also may help.

EDIT: Since you just want the files you may be able to try git archive. You'd use syntax something like

git archive --remote=ssh://git@bitbucket.org/username/reponame.git --format=tar --output="file.tar" master

I tried to test on a repo at my AWS Codecommit account but it doesn't seem to allow it. Someone on BitBucket may be able to test. Note that on Windows you'd want to use zip rather than tar, and this all has to be done over an ssh connection not https.

Read more about git archive at http://git-scm.com/docs/git-archive

Community
  • 1
  • 1
James Jones
  • 3,632
  • 5
  • 21
  • 41
  • Their solution for cloning was over a http connection. BitBucket normally has an option for downloading the Repo as a single file but that option is unavailable for a large repo (>4GB) – Sebastian Gray Dec 21 '15 at 06:22
1

I got it to work by using this method fatal: early EOF fatal: index-pack failed

But only after I setup SSL - this method still didn't work over HTTP.

The support at BitBucket was really helpful and pointed me in this direction.

Community
  • 1
  • 1
Sebastian Gray
  • 2,490
  • 3
  • 26
  • 36
1

BitBucket should have a way to build an archive even for large repo with Git 2.13.x/2.14 (Q3 2017)

See commit 867e40f (30 Apr 2017), commit ebdfa29 (27 Apr 2017), commit 4cdf3f9, commit af95749, commit 3c78fd8, commit c061a14, and commit 758c1f9, by Rene Scharfe.
(Merged by Junio C Hamano -- gitster -- in commit f085834, 16 May 2017)

archive-zip: support files bigger than 4GB

Write a zip64 extended information extra field for big files as part of their local headers and as part of their central directory headers.
Also write a zip64 version of the data descriptor in that case.

If we're streaming then we don't know the compressed size at the time we write the header. Deflate can end up making a file bigger instead of smaller if we're unlucky.
Write a local zip64 header already for files with a size of 2GB or more in this case to be on the safe side.

Both sizes need to be included in the local zip64 header, but the extra field for the directory must only contain 64-bit equivalents for 32-bit values of 0xffffffff.

VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
0

You can only clone the first commit and then the second commit...etc. It will be easier to pull if the difference between two commits is not very large. You can see more details from this answer.

ramwin
  • 3,978
  • 2
  • 20
  • 21