0

How I usually proceed:

Actually, to remove some files from complete history, I use the following script (which I call git-crunch) :

#!/bin/bash
#
# git crunch <filenames>
#
git filter-branch --index-filter "git rm --cached --ignore-unmatch $*"
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git fsck --full --unreachable
git repack -A -d
git gc --aggressive --prune=now

It works perfectly to remove specified files from (complete) history.

The context is:

I have 1 project which breaks into 4 branches. Here is a summary network diagram which shows up 2 of the 4 maintained branches:

┏ a937fd9 (1 year, 9 months ago) <new repo>
...
60 commits later, we create a branch "probe" which is displayed at left of the network
...
┣━┓
┣ ┃ c483a22 (8 months ago)
┃ ┣ f7b402c (3 months ago)
...
┃ ┣━┓
┣ ┃ ┃ 38bb93d (11 days ago)
┣ ┃ ┃ 1ef8202 (11 days ago)
┃ ┃ ┣━[remotes/origin/S...H_adjust]──adb243f (8 days ago)
┃ ┣ ┃ cd02775 (8 days ago)
┃ ┣━┛
┃ ┣ f9e40a3 (8 days ago)
┃ ┣ a30eb6f (7 days ago)
┃ ┣━[remotes/origin/S...H_verif]──4a3fe66 (7 days ago)
┃ ┗━[remotes/origin/HEAD]──[remotes/origin/master]──b452f85 (7 days ago)
┣ 91477ae (4 days ago)
┗━[HEAD]──[probe]──[remotes/origin/probe]──366c890 (48 minutes ago)

My probolem is:

I have 3 huge files in this repository which could be removed from the beginning (creaton of the repository) but if I use my script, it will only run in one branch, and it will recreate 4 separate branches from the beginning until last commit, which happened today.

My question:

How can I achieve these 3 files removal frmo my whole history without separating my branches from the beginning? Or, is there a way to rewrite the entire hisotry and remove files from all branches at once so I can keep my evolution network and "shared" commits intact (so I don't get my 61 first commits duplicated four times)?

MensSana
  • 471
  • 1
  • 5
  • 15

1 Answers1

1

The best tool for doing that is actually The BFG Repo Cleaner, a simpler, faster alternative to git filter-branch. For instance:

$ bfg --strip-blobs-bigger-than 10M

...removes all blobs bigger than 10MB (that aren't in your latest commit), and works on all branches & tags in your repo.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Roberto Tyley
  • 21,540
  • 9
  • 67
  • 98
  • Thanks a lot, have it but was uncertain if it would duplicate or not my commits for each branch... – MensSana Sep 30 '14 at 18:31
  • I guess there is a way to specify filename(s) instead of size ? – MensSana Sep 30 '14 at 18:32
  • 1
    In general, just try it on a copy of your repo and see what happens - the BFG is fast enough that it won't take you any time. The BFG runs over your entire repo, for *all* branches - you won't have any more commits or branches than you started with. You can specify filenames (though not paths) if you, like ('--delete-files')- in your case with 3 huge files, I would just use '--strip-blobs-bigger-than'. – Roberto Tyley Sep 30 '14 at 21:10
  • In fact, in my case, there are 3 useless huge files, and one very usefull bugger one... why I need to specify filenames... Thanks a bunch! – MensSana Oct 01 '14 at 15:13
  • If the useful bugger is in your latest commit, the BFG **won't delete it** anyway : http://rtyley.github.io/bfg-repo-cleaner/#protected-commits – Roberto Tyley Oct 01 '14 at 15:31
  • Ok, not that I want this to be an endless comment line, I will clarify the best as I can. My repo contains 4 big files from the beginning. I want to delete 3 of them, and want to keep one, which is the second bigger one... 52MB, 54MB, 69MB and 124MB. I want to keep the 69MB one. the 3 other ones are named AA.zip, ZButton.7z and CaRo.rar while the one I want to keep is named WantToKeep.bz2... – MensSana Oct 01 '14 at 15:47
  • So long as the 69MB file is part of your current file tree, the BFG won't delete it. You should ensure that the other 3 files /aren't/ part of your latest file-tree, and then the BFG will remove them just using the '--strip-blobs-bigger-than' flag. The BFG protects any content that is in your latest file tree. – Roberto Tyley Oct 01 '14 at 16:43
  • First I want to thank you for your involvement and your patience... Next, I must be very obfuscating my questions... The file I want to remove is, as the others are, part of the repository for many months (as mentionned in the first post). I don't want to use the file bigger than, even though you seem to stick on that way-to-go since it does not fit my needs. I'm happy if it fits yours, sincerely, but if I keep telling it does not work, believe, it does NOT... Thank you though for your good work and your patience, which is much appreciated. – MensSana Oct 15 '14 at 19:42
  • I suspect I'm not explaining very well, but somewhere between us there is a gap in understanding, so this is an interesting intellectual exercise for me! From your description so far and my understanding of the BFG's behaviour, I don't understand /why/ '--strip-blobs-bigger-than' would not work for you - and so that's interesting to me. I don't know if you've resolved your issue yet (possibly using '--delete-files', as mentioned above?), but have you actually /tried/ running --strip... on a copy of your local repo? If I've exhausted your patience, fair enough, don't feel you need to humour me! – Roberto Tyley Oct 15 '14 at 20:37
  • I guess you deserve some specs! :) What happened is the following: source files have been transferred from a deveelopper HD to somem git repositories without any cleanup before proceeding. Among the big files, there were "absolutely innapropriate" files, which were not the biggest ones, and not in the latest commit, but instead, being there from the beginning and not supposed to. Now that I want to clean up this mess (over a year after it has been initiallt commited, after many branches being created and maintained), I need to figure out the right way to do so. Here we are, thanks a bunch! – MensSana Nov 10 '14 at 16:26