0

My git server crashed and now now my repos are in an inconsistent state. RUnning git fsck --full shows the following output.

error: refs/heads/data-8989 does not point to a valid object! dangling commit b8cfe9e3e58c64411795cf9676ff228b12607e95 dangling commit a817ef9d4a423b6efee62b9af16979e6433943b1 dangling commit 4f9d59b0dcfa34dd9592474fe487f568a20b07ea dangling commit 22af4033b6224d2b075db7138801fd7b8244eb37 missing commit d2b142ca7e165429a47b6e303fad349f3ae51cc7

Is there any way to recover this back?

station
  • 5,659
  • 11
  • 47
  • 83

2 Answers2

0

From git-fsck documentation:

git-fsck tests SHA-1 and general object sanity, and it does full tracking of the resulting reachability and everything else. It prints out any corruption it finds (missing or bad objects), and if you use the --unreachable flag it will also print out objects that exist but that aren’t reachable from any of the specified head nodes (or the default set, as mentioned above).

Any corrupt objects you will have to find in backups or other archives (i.e., you can just remove them and do an rsync with some other site in the hopes that somebody else has the object you have corrupted).

You need to have a backup or some cloned local repository.

0

"Dangling" commits are normal even in a healthy repository: they are the tip of a set of unreachable commits, and generally arise from things like git rebase or git commit --amend (which deliberately abandon one or more "old" commits in favor of the "new and improved" copy or copies). However, in a faulty repository, some of these dangling commits—and other commits reachable "behind" them—might be ones you can and would want to recover.

The missing commit is a more serious problem. Given that only one such is shown, and exactly one reference, the branch data-8989, is invalid, it's likely that the missing commit is the one that was the tip commit of data-8989. In this case, one (but only one) of the dangling commits may be some commit further back in that chain.

Expressed visually, a normal graph might look something like this:

             o--o--o         <-- feature
            /       \
...--o--o--o------o--*--o    <-- branch1
         \
          o--o-----------o   <-- branch2
              \
               o----o--o     <-- branch3

where each o represents a commit. Each commit "points back" to its immediate parent commit, or, in the case of the merge commit (marked *), points back to two parents, namely the two commits that were merged. The names, feature and branch1 and so on, are how Git finds the tip of the branch. Once Git has this tip commit, Git uses the internal backward-pointing arrows to find their parent commit(s), and then uses the parents to find the grandparent commits, and so on.

Running git fsck simply makes sure that all commits in the repository can be found (reached) this way. It's normal for some not to be. For instance, we might decide that branch3 is terrible and simply erase the label, i.e., delete the branch. The three commits that are only reachable from branch3 are now abandoned. The tip is "dangling" and the other two, Git doesn't bother mentioning.

Note that a branch tip commit need not be the end of the chain. It's just treated as if it were the last one, whenever you start at that tip. This is the case for feature, for instance. Git does not—can not, at least not easily—look forward along a chain, because all of Git's internal arrows connecting commits go backwards only, from child to parent. (This is a major part of why git fsck is relatively slow: it has to do a whole lot of work to, in effect, reverse the internal arrows.) It's easy to start from the tip of branch1, work backwards, find the merge, work back-and-up, and find that you have the tip commit of feature. It's hard to go the other way. Git therefore normally works backwards.

Now, when a repository gets damaged, we may lose the labels themselves, or we might lose some commits or other internal objects (there are four object types total but we'll concentrate on just the commits here). The ones most likely to be damaged are the most recently created (for objects) or updated (for labels). This is because, once created, no object is ever changed.1 Computer crashes tend to lose or corrupt recently-touched files, rather than older less-active files.

Consider what happens if we lose, not the name branch2, but its tip commit:

             o--o--o         <-- feature
            /       \
...--o--o--o------o--*--o    <-- branch1
         \
          o--o           ?   <-- branch2
              \
               o----o--o     <-- branch3

In this case we'll get a complaint that there is a missing commit, because the name branch2 says "find commit 1234567" or whatever, and it's not there. It's the one we lost.

We won't get any dangling commits, though, because the parent commit of branch2 was also on branch3. Now that commit is only on branch3, rather than being on both branches, because the outgoing backwards arrow is part of the commit that we lost.

If we lose the tip of branch1 we do get a dangling commit:

             o--o--o         <-- feature
            /       \
...--o--o--o------o--*   ?    <-- branch1
         \
          o--o-----------o   <-- branch2
              \
               o----o--o     <-- branch3

The merge commit * no longer has any way to be found, so it's "dangling". One of its parents can be found under the name feature, and the other can be found if we restore the merge commit itself, but if we don't, and we let Git garbage-collect the unreachable commits, the graph strips down to this:

             o--o--o         <-- feature
            /
...--o--o--o            ?    <-- branch1
         \
          o--o-----------o   <-- branch2
              \
               o----o--o     <-- branch3

Thus, given a damaged repository, if you don't have backups or other clones from which to recover "missing" items, it's wise to stop modifying it (don't add anything to it) and to run git fsck --lost-found to make Git save any "dangling" commits into .git/lost-found/commit/. You can then look at these (using git log by hash ID) to see if they are valuable. Git will also save unreachable blobs (files) into .git/lost-found/other/; you can look at the files' contents directly there with any file-viewer, and recover some lost files that way.

Your best bet, though, is to have another clone (or proper backups).


1But objects can get packed or repacked, which touches the way they are stored, so this is not a hard-and-fast rule.

torek
  • 330,127
  • 43
  • 437
  • 552