12

I've a large blob that I want to get rid of! I thought I removed the file using this solution: http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ (I've used -- --all instead of HEAD so that files are removed from all branches)

rm -rf .git/refs/original/ && git reflog expire --all &&  
    git gc --aggressive --prune

I've looked in the pack folder via this Why is my git repository so big?

$ git verify-pack -v .git/objects/pack/pack-*.idx | sort -k3n
... last 4 lines:
bc7ae9801052180b283cd81880753549f0f92587 blob   19464809 749446 305054873
acd5f09a35846bec25ebc324738139e5caabc50f blob   294278199 71381636 39607483
986d152935434b56cf182d8a32e24cb57af75ac3 blob   480385718 108184804 110989119
ba9d1d27ee64154146b37dfaf42ededecea847e1 blob   761172819 27430741 277589990

The script git-find-blob is taken from Which commit has this blob?

$ ./git-find-blob ba9d1d27ee64154146b37dfaf42ededecea847e1

But it doesn't find anything.

Any ideas how to get rid of it from my repository?

Community
  • 1
  • 1
EoghanM
  • 20,021
  • 21
  • 80
  • 110
  • 1
    Is the output of `git status` empty? It's possible that the blob has been added to the index, but never committed. – Mark Longair Sep 16 '11 at 08:03
  • It might be useful if you also included the output of `git fsck --cache --unreachable $(git for-each-ref --format="%(objectname)")` and the same command without the `--cache` – Mark Longair Sep 16 '11 at 08:18
  • Thanks for your continued attention Mark; the blob is listed in both variants of the command as 'unreachable blob'. There are 7 (other) extra unreachable blobs listed in the variant without the `--cache` flag. – EoghanM Sep 19 '11 at 16:51
  • Is the ref packed? Does it appear in `git show-ref`? – Josh Lee Sep 21 '11 at 19:32
  • @MarkLongair thanks! I tried everything to clean 2GB of unreferenced blobs out of my repo, without realising that they were in the index the whole time! (staged for deletion) – thenickdude Jul 09 '15 at 06:25

5 Answers5

7

You can use git repack -Ad to force git to reconstruct your packs, and to unpack any unreachable objects into loose objects. At this point you can use git gc --prune=now to discard the unreachable objects.

You should also double-check that you actually expired your reflogs. I believe git reflog expire --all will default to 90 days (or 30 for unreachable objects), so you may want to use git reflog expire --expire-unreachable=now --all instead (this needs to be done before the repack+gc).

Leo
  • 2,178
  • 2
  • 20
  • 41
Lily Ballard
  • 169,315
  • 25
  • 364
  • 333
  • thx, that worked! the reflog expire with unreachable=now AND the gc --prune=now after repacking did the trick. the first one cleared this last reference and the second one got rid of the object itself. – Harald Schilly Jan 06 '12 at 19:05
  • Great! Worked for me as well. Repo went from 80 MiB to 4.5 MiB. – Leo Feb 09 '12 at 08:30
2

You want to use the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch designed for removing large files from Git repos.

Download the Java jar (requires Java 6 or above) and run this command:

$ java -jar bfg.jar  --strip-blobs-bigger-than 20M  my-repo.git

Any blob over 20M in size (that isn't in your latest commit) will be totally removed from your repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

The BFG is typically 10-50x faster than running git-filter-branch and the options are tailored around these two common use-cases:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Roberto Tyley
  • 21,540
  • 9
  • 67
  • 98
2

Firstly, in your git gc invocation, you should use --prune=now, since the default is to keep objects which are less than 2 weeks old.

Secondly, the git-find-blob command you've used by default only looks in the history of HEAD for commits, so if the blob is on another branch then that script will miss it. Try invoking it as:

./git-find-blob ba9d1d27ee64154146b37dfaf42ededecea847e1 --all
Mark Longair
  • 385,867
  • 66
  • 394
  • 320
  • I tried `--all` on both the perl and the bash versions of `git-find-blob`, but still no dice. I also tried `--prune=now` on `gc --aggressive` but the blob is still there! – EoghanM Sep 15 '11 at 17:20
1

The blob doesn't appear on the other side of a clean push, so this will be my solution (push to a new location, then clone from that location). Any easier way of doing it?

EoghanM
  • 20,021
  • 21
  • 80
  • 110
0

Having the same issue. Discovered my troublesome blob is referenced by an unreachable tree. Adding to the git-find-blob script:

git fsck --full --unreachable | \
while read unreachable obj tree
do
    if [[ ! $obj == "tree" ]]; then
        continue
    fi
    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo "$unreachable $obj $tree"
    fi
done

I was able to remove the blob using BFG Repo-Cleaner but I'd be much happier solving the problem using native git commands.

Doug
  • 605
  • 1
  • 5
  • 14