git: why can I checkout to the commit I deleted?

Question

I have a branch with three commits:

mybranch: a -> b -> c

I pushed it to the remote repo. Then I decided that I don't want to keep the commits b and c, so deleted them as described here:

git reset --hard HEAD~1
git reset --hard HEAD~1
git push origin mybranch -f

Afterwards, I checked git log and confirmed that only the commit a was visible. However:

I still can checkout to the commits b and c. Why is that?
when I used SourceTree to look up my git repo, I still could see these commits in my branch (although the current commit a was denoted correctly as the commit I was currently on). I used git bash to check that I was on the correct commit and that my HEAD was not in the detached state.

What is wrong with the procedure I used? Why did it keep the commits b and c? How can I remove them completely?

See http://think-like-a-git.net/sections/graph-theory.html – ElpieKay Aug 29 '17 at 13:27 — ElpieKay, Aug 29 '17 at 13:27

score 2 · Accepted Answer · answered Aug 29 '17 at 12:53

Actually deleting a commit is fairly difficult in git, by design. Many commands that people think delete commits (like rebase, or reset), actually just make those commits "unreachable" - causing the default output of various commands and tools to exclude them.

It's relatively rare that the reason to delete a commit warrants the cost. Sometimes a commit contains sensitive information (though, in that case, it's almost always best to consider the information compromised, whether or not you take efforts to scrub it from the repo). Maybe a commit contains excessively large binary files that are not present in any other commit, bloating the repo. If it just boils down to wanting to "hide" a "mistake" so the repo looks perfect, I wouldn't waste time on it.

But if you do want to remove the commit, here's what you need to know:

First, you have to remove all knowledge of the commit. Your reset commands have made it "unreachable" (by parent pointers) from the branch on which you ddi the resets. If there are other branches that can reach the commits, they need to be reset or rebased away from the commit (or deleted). If there are tags on the removed commits, they need to be moved or deleted. There are special cases when other refs could point to the commits, but I'll assume they don't apply. (It would be things like replacements, or backup refs from filter-branch... Basically if you can find the SHA for either commit in the .git/packed-refs file or in any file under refs, then some action is needed to remedy that.)

Once all refs are removed, the commit is "dangling"; but it still may be reachable via the reflog. You can try to expiire the reflogs

git reflog expire --expire=all --all

I've never had much luck with that (which probably just means I never remember the right arguments); I always end up doing something like

rm -r .git/logs

The downside in any event is that you lose all of your reflog information. You can be more selective about which reflogs you expire. (You probably need HEAD and any branch from which the commits are (or were) reachable.) You could even use delete instead of expire to hunt down individual reflog entries. Again it all depends how much effort you're wanting to put into this.

So once there are no refs and no reflogs that can reach the commit, gc can be used to physically delete the commit from the local repo.

git gc --aggressive --prune=now

But now there's still an issue: If the commits were ever pushed, the remote still has them; and pushing now won't delete them from the remote. (Pushing updates remote refs and, as needed, adds objects to fill in history; but it doesn't delete objects from the remote.)

If the remote is just a repo on a file share (or web server you control, or whatever): you can log into the server and clean it up the same way you cleaned up your local. (If you've pushed refs, then that part's already done; but you may have to clean up reflogs and you will have to run gc.)

If the remote is hosted (github, gitlab, TFS, bitbucket...) then it depends on what access to gc is provided by the host. In TFS (at least versions I've used) you're up a tree; at best you could delete and recreate the repo. Other host servers may provide the ability to trigger gc, or may even run gc automatically after certain events; you'd have to consult the docs for the hosting service/software.

kowsky · Answer 2 · 2017-08-29T12:45:32.097

git reset does not delete commits, it resets your branch to the given commit (with HEAD~1, the direct predecessor of your branches' current HEAD commit). The successor commit is then no longer part of your branch. If no other branch has the commit in its' history, the commit will become a 'dangling' commit, not reachable by any branch (Edit: actually it becomes 'unreachable' at first, and 'dangling' only later when it's not even reachable by reflog; see comments to linked answer below). If it stays like that for a longer time, gits' garbage collection will remove it eventually. Until it does, the commit will still be reachable by its SHA id.

This is, in fact, very handy if you mess up your branches histories. With reflogor other means, you can obtain lost commits' SHA ids and restore your work if it hasn't been lost for too long.

SourceTree still shows the connection from b to a because every commit knows its' predecessor. b and c however are no longer part of your branch, since its' HEAD commit is a.

There's nothing wrong with what you did, and there is no need to furhter try to delete the commits. if you keep working in the repository, they will be deleted eventually. See this answer for details on the deletion via garbage collection.

git: why can I checkout to the commit I deleted?

2 Answers2

Linked

Related