6

I wanted to remove a directory and its contents from the history of a git repository to reduce the size of this git repository. (The directory contained binary assets such as models and textures and contributed by far the most to the size of the git repository.)

I used the following solution to a previous question:

git filter-branch --tree-filter 'rm -rf the_directory' --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo the_directory/ >> .gitignore
git add .gitignore
git commit -m 'Removing the_directory from git history'
git gc
git push origin master --force

This seemed to have worked because I cannot find any references anymore to this directory and its content in my commit history on Github. (I have above 1500 commits and the directory was always there but isn't anymore. I even cannot find the commit anymore in which I explicitly deleted the directory (from the repository but not from the history).)

Unfortunately, the size of the repository was not changed according to Github. I still have a PACK file of 450MB (while the actual repository is now below 14MB).

I used the following git commands for finding the largest files:

git verify-pack -v .git/objects/pack/pack-*.idx | sort -k 3 -g | tail -5
git rev-list --objects --all | grep the_id

Conclusion the largest files are still located in the directory I want to get rid of?

I tried various approaches:

but the PACK file stays pretty much the same or becomes even larger (~500MB).

How can I reduce the size of the PACK file and thus my git repository and more particularly remove the files, contained in the directory and its content I removed, from the PACK file?

Matthias
  • 3,833
  • 11
  • 36
  • 74
  • 3
    You ran `git filter-branch` on `HEAD`. How about other branches and tags? And in the local repository, reflogs are working. As long as these refs exist, the old commits are still there. – ElpieKay Oct 29 '17 at 23:54
  • You might want to consider alternatives, like leaving the server history alone and using [`--depth` to reduce the download size](https://stackoverflow.com/a/1210012/211627). – JDB still remembers Monica Oct 30 '17 at 00:02
  • @ElpieKay there is and was only one branch. I didn't create any tags manually, but now that you mention it, I nearly created 100 releases via Github's web interface. – Matthias Oct 30 '17 at 07:20
  • @ElpieKay the cause of the problem is indeed related to the tags like you mention. This [answer](https://stackoverflow.com/a/32886427/1731200) does the job (450MB -> 46MB) by cleaning the tags as well. Alternatively, one can use the BFG Repo-Cleaner as mentioned below. – Matthias Oct 30 '17 at 08:24

1 Answers1

2

You can try BFG Repo-Cleaner and its --delete-folders option:
(do so on a bare cloned repo, copy of your repo for testing)

bfg --delete-folders the_directory --delete-files the_directory  --no-blob-protection my-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

That would by default update your commits and all branches and tags.

VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • If I have a local git repo, what should I put instead of `my-repo.git`, since my remote uri is something like `ssh://git-******-user@host/volume/GitRepos/repo`? – Angel Todorov Nov 14 '19 at 05:41
  • 1
    @AngelTodorov you can clone --bare your SSH remote URL to get a local repo.git (replace repo by the actual name of your repository) – VonC Nov 14 '19 at 05:43
  • OK, so I followed [bfg steps](https://rtyley.github.io/bfg-repo-cleaner/): 1) git clone --mirror ....; 2) bfg --delete-files ....; 3) cd some-big-repo.git; 4) git reflog expire --expire=now --all && git gc --prune=now --aggressive and finally 5) git push. I have however this as an output: `remote: error: denying non-fast-forward refs/heads/master (you should pull first) To ssh://user@host/repo ! [remote rejected] master -> master (non-fast-forward) error: failed to push some refs to 'ssh://user@host/repo'` – Angel Todorov Nov 14 '19 at 07:42
  • @AngelTodorov Any kind of filtering (BFG, filter-branch or the [new filter-repo](https://stackoverflow.com/a/58251653/6309)) is bound to rewrite history, necessitating a `git push --force` . – VonC Nov 14 '19 at 07:45
  • OK, I replaced step 5 with `git push --force`. Still, I have the same error – Angel Todorov Nov 14 '19 at 07:48
  • @AngelTodorov That is on the server side then: it is set to deny any non fast-forward push. Do yo have access to that server? – VonC Nov 14 '19 at 07:50
  • you might be right - this is part of the output from `bfg` - `Found 237 objects to protect Found 2 commit-pointing refs : HEAD, refs/heads/master Protected commits ----------------- These are your protected commits, and so their contents will NOT be altered: * commit 60c18c35 (protected by 'HEAD')`. And yes, I have access to the server. Let me know, what shall I do on the server side. – Angel Todorov Nov 14 '19 at 07:54
  • @AngelTodorov See https://stackoverflow.com/a/11500342/6309: `git config --system receive.denyNonFastForwards false` on the server side. – VonC Nov 14 '19 at 08:05