8

Possible Duplicate:
How can I remove a commit on github?

So a team member on our project has committed and pushed 700+ megs of nonsense to our git project repository... She thought she was adding only 2 images but instead somehow ended up copying the entire contents of her desktop to the git folder and somehow committing it. I don't know why she thought it wasn't weird that 2 pictures took 20 minutes to upload...

Anyways I'm in a predicament now as head of this project. I have 2 choices as far as I see it and I don't like either one

  1. I could delete the repository from bitbucket and start it again with the files I want. This would remove all previous edits since only the current version of the files I want will be available

  2. I could delete the erroneous data and push the changes. Only the files we want will be further managed but all the extra crud she put up there will forever exist in the git inflating our project 100 fold.

Is there any way to ACTUALLY remove a commit forever as if it never happened? What would be the best way to manage this hiccup other than more remedial git training...

Community
  • 1
  • 1
Parad0x13
  • 1,846
  • 3
  • 21
  • 37
  • You can use github's advice - https://help.github.com/articles/remove-sensitive-data – AD7six Dec 30 '12 at 23:03
  • I've read that article, but it is focused for single files and fixing them from being in the repository. I have >9300 files that need to be removed and it would be implausible to manually remove and .gitignore each one of them. Plus some of the files that were committed have file names that we may want in the project later on and is another reason I don't want to gitignore them. – Parad0x13 Dec 30 '12 at 23:20
  • Removing the commit from a branch's history is the easy/obvious bit - use `git rebase`. The rest of the article is the most relevant e.g. "If you [need to delete/obliterate the unwanted commits], you will have to delete the repo and recreate it." – AD7six Dec 31 '12 at 09:29

2 Answers2

2

I would do

$ git checkout master~1 -b newmaster
$ git branch -D master
$ git reflog expire --expire=now --all
$ git gc --prune=now
$ git checkout -b master
$ git branch -D newmaster

gc should do garbage collection. I think you can do this on the server.

Sergiu Dumitriu
  • 10,652
  • 3
  • 34
  • 61
Luigi R. Viggiano
  • 8,103
  • 6
  • 46
  • 60
  • Note that this assumes that `master` currently points to the unwanted commit. This can't be done on the server if the server is BitBucket, but `git push -f origin master` after the changes will clean things up. – Michael Mior Dec 30 '12 at 22:57
  • I see, I'm going to check this out but how does it actually remove the files from the git? I thought git was supposed to remember every change that was made to it via commits and so wont the files forever be retained in the .git? I don't want this, I want all those files to have 'never been committed' as if I was pushing time back (in this case) 9 hours – Parad0x13 Dec 30 '12 at 23:03
  • basically the above instructions move the master branch backward of 1 commit, then calling the `gc` git removes all unreferenced commits (it means all the unreachable commits). Since the master is back of 1 commit, the last commit is detached from the HEAD, so it will be garbage collected (the files regarding that commit will be pruned out). This involves deleting the blobs objects(gzipped version of all your committed files) as well as the tree object and the commit object. I don't know if you can run `gc` on bitbouquet though. – Luigi R. Viggiano Dec 30 '12 at 23:06
  • Okay... gonna do further research in git because I didn't think that was the way git functioned. By calling gc I didn't think it would actually remove the blobs from the server, it would only prune your local copy of the repository – Parad0x13 Dec 30 '12 at 23:17
  • You can run gc (and all the above commands) also on the server, if you can ssh into it. Maybe if you do `git push -f origin master~1:master` then on the server `git gc`, it's the easiest way. But still you need to do a `gc` on the server; I don't know if bitbucket has this feature. In git (differently from subversion) the repository history is modifiable. – Luigi R. Viggiano Dec 30 '12 at 23:19
  • Thanks luigi and all for the advice and clarification on gc. I'll look into this right away and hopefully it'll fix my issue. Hopefully bitbucket has the ability to do this >.> – Parad0x13 Dec 30 '12 at 23:24
  • Also have a look at this question: http://stackoverflow.com/questions/11403985/remove-deleted-files-on-bitbucket – Luigi R. Viggiano Dec 30 '12 at 23:28
  • Exactly what I needed, thanks! – Parad0x13 Dec 31 '12 at 00:00
  • 1
    This won't actually remove the bad commit from the repository, because the previous commit ID will still be referenced in the `reflog`. Insert `git reflog expire --expire=now --all` before `git gc`. Also, `git gc` doesn't remove recent loose objects unless forced to do so: `git gc --prune=now`. – Sergiu Dumitriu Jan 02 '13 at 16:43
1

I don't know if bitbucket allows it, but you can do:

git reset HEAD~
# Or the SHA of the version before the huge commit
git push --force
Sergiu Dumitriu
  • 10,652
  • 3
  • 34
  • 61
  • Wont this keep the erroneous files part of the git though? albeit hidden somewhere in the .git cache? – Parad0x13 Dec 30 '12 at 22:59
  • That wont remove the orphaned commits from the repository – AD7six Dec 30 '12 at 22:59
  • Not from the local repository, but any further clone/pull won't get them. Indeed, a `gc` is going to be needed to completely remove them from the local repository. – Sergiu Dumitriu Dec 30 '12 at 23:37
  • Not true; every clone will get them, and to not get them they need removing from the remote - and steps take so that no other clone puts the commits back too – AD7six Dec 31 '12 at 09:03
  • Why? A fetch, by [definition](http://www.kernel.org/pub/software/scm/git/docs/git-fetch.html), "fetches named heads or tags from one or more other repositories, along with the objects necessary to complete them". So, it only fetches the commits (and their blobs) reachable from branches. If the large commit is no longer on a branch, it will not be copied. – Sergiu Dumitriu Dec 31 '12 at 21:29
  • You are right about fetching. **However** [a clone will still fetch orphaned commits](https://gist.github.com/32bde40a0df2ea87f43e). You can't _just_ update references and forget that the repo has MBs of junk in it - it's just ignoring a problem in the repository. – AD7six Jan 02 '13 at 13:03
  • True, a clone will fetch loose objects, but I wonder how can you run `git gc` on a remote server. If they have their own git server, that's doable, but GitHub doesn't allow that. Fortunately, GitHub automatically hides/prunes loose objects when discarding commits, AFAIK. – Sergiu Dumitriu Jan 02 '13 at 16:56