39

I know that the history in Git is stored in a data structure called a DAG. I've heard about DFS and know it's somewhat related.

I'm curious, how do programs such as git log --graph or hg graphlog draw the history? I always thought it's quite complicated to draw the lanes and everything in such a nice way.

Could someone write some pseudo code that demonstrates it?

note: I tried looking around Git or hg's code but it's very hard to follow and get a general idea of what's going on.

Emma
  • 1
  • 9
  • 28
  • 53
daniel.jackson
  • 521
  • 5
  • 6
  • 5
    Here’s Git’s [graph.c](http://git.kernel.org/?p=git/git.git;a=blob;f=graph.c) for reference. – Josh Lee Feb 08 '11 at 14:10
  • 2
    Post a simplified (but well-specified) version of the "how to display a DAG as a textual graph" problem as an SO question and tag it as `code-golf`. You will get many clever solutions, in Python, Ruby, C, Perl... You might ask people to post their original non-golf-ified code as well as their "squeezing out every last character" version. – MatrixFrog Feb 13 '11 at 07:22
  • 1
    Also, Git’s [history graph API](http://www.kernel.org/pub/software/scm/git/docs/technical/api-history-graph.html) is useful. – Josh Lee Apr 25 '11 at 15:49
  • @Josh Lee answer provides api, usage and samples. With that you should understand how git log --graph operates. You can find api too in [api-history-graph.txt](https://github.com/git/git/blob/master/Documentation/technical/api-history-graph.txt). You need [asciidoc](http://www.methods.co.nz/asciidoc/) to get html from it. – albfan Apr 07 '13 at 01:04
  • With Git 2.18 (Q2 2018), a `git log --graph` now has a `commit-graph` file to use for speeding up the walk. See [my answer below](https://stackoverflow.com/a/50275501/6309) – VonC May 10 '18 at 14:41

4 Answers4

7

First, one obtains a list of commits (as with git rev-list), and parents of each commit. A "column reservation list" is kept in memory.

For each commit then:

  • If the commit has no column reserved for it, assign it to a free column. This is how the branch heads will start.
  • Print the tree graphics according to the column reservation list, and then the commit message
  • The reservation's list entry for the current column/commit is updated with the first parent of the current commit, such that the parent is going to be printed in the same column.
  • Other parents get a new free column.
  • If this was a merge, the next line will try to link the second parent to a column where the commit is expected (this makes for the loops and the "≡ bridge")

Example showing output of git-forest on aufs2-util with an extra commit to have more than one branch).

Example

With lookahead, one can anticipate how far down the merge point will be and squeeze the wood between two columns to give a more aesthetically pleasing result.

knpwrs
  • 13,484
  • 10
  • 56
  • 98
user611775
  • 1,293
  • 7
  • 11
5

I tried looking around Git or hg's code but it's very hard to follow and get a general idea of what's going on.

For hg, did you try to follow the code in hg itself, or in graphlog?

Because the code of graphlog is pretty short. You can find it in hgext/graphlog.py, and really the important part is the top ~200 lines, the rest is the extension's bootstrapping and finding the revision graph selected. The code generation function is ascii, with its last parameter being the result of a call to asciiedge (the call itself is performed on the last line of generate, the function being provided to generate by graphlog)

masklinn
  • 59
  • 1
4

This particular problem isn't that hard, compared to graph display in general. Because you want to keep the nodes in the order they were committed the problem gets much simpler.

Also note that the display model is grid based, rows are commits and columns are edges into the past/future.

While I didn't read the git source you probably just walk the list of commits, starting from the newest, and maintain a list of open edges into the past. Following the edges naturally leads to splitting/merging columns and you end up with the kind of tree git/hg display.

When merging edges you want to avoid crossing other edges, so you'll have to try to order your columns ahead of time. This is actally the only part that may not be straightforward. For example one could do a two-pass algorithm, making up a column order for the edges in the first pass and doing the drawing in the second pass.

Zarat
  • 2,239
  • 18
  • 34
  • 6
    The output of `git log --graph` frequently has edges crossing, and it's not in chronological order. I think it's a little less trivial than you're suggesting, even if it is a relatively case of graph display. – Cascabel Jan 19 '11 at 21:07
  • 1
    Well, by starting with the newest at top and following edges into the past, most of what I said still applies even without a strict ordering of commits. Having frequent edge crossings may be impossible to avoid depending on the commit graph, and they probably don't spend much on figuring out an ideal order. I didn't want to suggest it's trivial though, just straightforward to come up with a good solution. – Zarat Jan 20 '11 at 06:47
2

Note: Git 2.18 (Q2 2018) does now pre-compute and store information necessary for ancestry traversal in a separate file to optimize graph walking.

That notion of commits graph does change how 'git log --graph' does work.

As mentioned here:

git config --global core.commitGraph true
git config --global gc.writeCommitGraph true
cd /path/to/repo
git commit-graph write

See commit 7547b95, commit 3d5df01, commit 049d51a, commit 177722b, commit 4f2542b, commit 1b70dfd, commit 2a2e32b (10 Apr 2018), and commit f237c8b, commit 08fd81c, commit 4ce58ee, commit ae30d7b, commit b84f767, commit cfe8321, commit f2af9f5 (02 Apr 2018) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit b10edb2, 08 May 2018)

You now have the command git commit-graph: Write and verify Git commit graph files.

Write a commit graph file based on the commits found in packfiles.
Includes all commits from the existing commit graph file.

The design document states:

Git walks the commit graph for many reasons, including:

  1. Listing and filtering commit history.
  2. Computing merge bases.

These operations can become slow as the commit count grows. The merge base calculation shows up in many user-facing commands, such as 'merge-base' or 'status' and can take minutes to compute depending on history shape.

There are two main costs here:

  1. Decompressing and parsing commits.
  2. Walking the entire graph to satisfy topological order constraints.

The commit graph file is a supplemental data structure that accelerates commit graph walks. If a user downgrades or disables the 'core.commitGraph' config setting, then the existing ODB is sufficient.

The file is stored as "commit-graph" either in the .git/objects/info directory or in the info directory of an alternate.

The commit graph file stores the commit graph structure along with some extra metadata to speed up graph walks.
By listing commit OIDs in lexicographic order, we can identify an integer position for each commit and refer to the parents of a commit using those integer positions.
We use binary search to find initial commits and then use the integer positions for fast lookups during the walk.

You can see the test use cases:

git log --oneline $BRANCH
git log --topo-order $BRANCH
git log --graph $COMPARE..$BRANCH
git branch -vv
git merge-base -a $BRANCH $COMPARE

This will improve git log performance.


Git 2.19 (Q3 2018) will take care of the lock file:

See commit 33286dc (10 May 2018), commit 1472978, commit 7adf526, commit 04bc8d1, commit d7c1ec3, commit f9b8908, commit 819807b, commit e2838d8, commit 3afc679, commit 3258c66 (01 May 2018), and commit 83073cc, commit 8fb572a (25 Apr 2018) by Derrick Stolee (derrickstolee).
Helped-by: Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit a856e7d, 25 Jun 2018)

commit-graph: fix UX issue when .lock file exists

We use the lockfile API to avoid multiple Git processes from writing to the commit-graph file in the .git/objects/info directory.
In some cases, this directory may not exist, so we check for its existence.

The existing code does the following when acquiring the lock:

  1. Try to acquire the lock.
  2. If it fails, try to create the .git/object/info directory.
  3. Try to acquire the lock, failing if necessary.

The problem is that if the lockfile exists, then the mkdir fails, giving an error that doesn't help the user:

"fatal: cannot mkdir .git/objects/info: File exists"

While technically this honors the lockfile, it does not help the user.

Instead, do the following:

  1. Check for existence of .git/objects/info; create if necessary.
  2. Try to acquire the lock, failing if necessary.

The new output looks like:

fatal: Unable to create
'<dir>/.git/objects/info/commit-graph.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. 
Please make sure all processes are terminated then try again. 
If it still fails, a git process may have crashed in this repository earlier:
remove the file manually to continue.

Note: The commit-graph facility did not work when in-core objects that are promoted from unknown type to commit (e.g. a commit that is accessed via a tag that refers to it) were involved, which has been corrected with Git 2.21 (Feb. 2019)

See commit 4468d44 (27 Jan 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 2ed3de4, 05 Feb 2019)


That algorithm is being refactored in Git 2.23 (Q3 2019).

See commit 238def5, commit f998d54, commit 014e344, commit b2c8306, commit 4c9efe8, commit ef5b83f, commit c9905be, commit 10bd0be, commit 5af8039, commit e103f72 (12 Jun 2019), and commit c794405 (09 May 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit e116894, 09 Jul 2019)

Commit 10bd0be explain the change of scope.


With Git 2.24 (Q3 2109), the code to write commit-graph over given commit object names has been made a bit more robust.

See commit 7c5c9b9, commit 39d8831, commit 9916073 (05 Aug 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 6ba06b5, 22 Aug 2019)


And, still with Git 2.24 (Q4 2019), the code to parse and use the commit-graph file has been made more robust against corrupted input.

See commit 806278d, commit 16749b8, commit 23424ea (05 Sep 2019) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 80693e3, 07 Oct 2019)

t/t5318: introduce failing 'git commit-graph write' tests

When invoking 'git commit-graph' in a corrupt repository, one can cause a segfault when ancestral commits are corrupt in one way or another.
This is due to two function calls in the 'commit-graph.c' code that may return NULL, but are not checked for NULL-ness before dereferencing.

Hence:

commit-graph.c: handle commit parsing errors

To write a commit graph chunk, 'write_graph_chunk_data()' takes a list of commits to write and parses each one before writing the necessary data, and continuing on to the next commit in the list.

Since the majority of these commits are not parsed ahead of time (an exception is made for the last commit in the list, which is parsed early within 'copy_oids_to_commits'), it is possible that calling 'parse_commit_no_graph()' on them may return an error.
Failing to catch these errors before de-referencing later calls can result in a undefined memory access and a SIGSEGV. ² One such example of this is 'get_commit_tree_oid()', which expects a parsed object as its input (in this case, the commit-graph code passes '*list').
If '*list' causes a parse error, the subsequent call will fail.

Prevent such an issue by checking the return value of 'parse_commit_no_graph()' to avoid passing an unparsed object to a function which expects a parsed object, thus preventing a segfault.


With Git 2.26 (Q1 2020), the code to compute the commit-graph has been taught to use a more robust way to tell if two object directories refer to the same thing.

See commit a7df60c, commit ad2dd5b, commit 13c2499 (03 Feb 2020), commit 0bd52e2 (04 Feb 2020), and commit 1793280 (30 Jan 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit 53c3be2, 14 Feb 2020)

commit-graph.h: store an odb in 'struct write_commit_graph_context'

Signed-off-by: Taylor Blau

There are lots of places in commit-graph.h where a function either has (or almost has) a full struct object_directory *, accesses ->path`, and then throws away the rest of the struct.

This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged).
These paths are normalized with normalize_path_copy() which mitigates some comparison issues, but not all 1.

Replace usage of char *object_dir with odb->path by storing a struct object_directory* in the write_commit_graph_context structure.
This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c'.

Resolving a user-provided '--object-dir' argument now requires that we compare it to the known alternates for equality.

Prior to this patch, an unknown '--object-dir' argument would silently exit with status zero.

This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure.
Make this error non-silent by 'die()'-ing when the given '--object-dir' does not match any known alternate object store.


With Git 2.28 (Q3 2020), the commit-graph write --stdin-commits is optmized.

See commit 2f00c35, commit 1f1304d, commit 0ec2d0f, commit 5b6653e, commit 630cd51, commit d335ce8 (13 May 2020), commit fa8953c (18 May 2020), and commit 1fe1084 (05 May 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit dc57a9b, 09 Jun 2020)

commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag

Helped-by: Jeff King
Signed-off-by: Taylor Blau

Since 7c5c9b9c57 ("commit-graph: error out on invalid commit oids in 'write --stdin-commits'", 2019-08-05, Git v2.24.0-rc0 -- merge listed in batch #1), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around this change by doing the following:

git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
  !/commit/ { print "not-a-commit:"$1 }
   /commit/ { print $1 }
' |
git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals.

This is tested with Git 2.28 (Q3 2020).

See commit 94fbd91 (01 Jun 2020), and commit 6334c5f (03 Jun 2020) by Taylor Blau (ttaylorr).
(Merged by Junio C Hamano -- gitster -- in commit abacefe, 18 Jun 2020)

t5318: test that '--stdin-commits' respects '--[no-]progress'

Signed-off-by: Taylor Blau
Acked-by: Derrick Stolee

The following lines were not covered in a recent line-coverage test against Git:

builtin/commit-graph.c
5b6653e5 244) progress = start_delayed_progress(
5b6653e5 268) stop_progress(&progress);

These statements are executed when both '--stdin-commits' and '--progress' are passed. Introduce a trio of tests that exercise various combinations of these options to ensure that these lines are covered.

More importantly, this is exercising a (somewhat) previously-ignored feature of '--stdin-commits', which is that it respects '--progress'.

Prior to 5b6653e523 ("[builtin/commit-graph.c](https://github.com/git/git/blob/94fbd9149a2d59b0dca18448ef9d3e0607a7a19d/builtin/commit-graph.c): dereference tags in builtin", 2020-05-13, Git v2.28.0 -- merge listed in batch #2), dereferencing input from '--stdin-commits' was done inside of commit-graph.c.

Now that an additional progress meter may be generated from outside of commit-graph.c, add a corresponding test to make sure that it also respects '--[no]-progress'.

The other location that generates progress meter output (from d335ce8f24 ("[commit-graph.c](https://github.com/git/git/blob/94fbd9149a2d59b0dca18448ef9d3e0607a7a19d/commit-graph.c): show progress of finding reachable commits", 2020-05-13, Git v2.28.0 -- merge listed in batch #2)) is already covered by any test that passes '--reachable'.


With Git 2.29 (Q4 2020), in_merge_bases_many(), a way to see if a commit is reachable from any commit in a set of commits, was totally broken when the commit-graph feature was in use, which has been corrected.

See commit 8791bf1 (02 Oct 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit c01b041, 05 Oct 2020)

commit-reach: fix in_merge_bases_many bug

Reported-by: Srinidhi Kaushik
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

Way back in f9b8908b ("[commit.c](https://github.com/git/git/blob/8791bf18414a37205127e184c04cad53a43aeff1/commit.c): use generation numbers for in_merge_bases()", 2018-05-01, Git v2.19.0-rc0 -- merge listed in batch #1), a heuristic was used to short-circuit the in_merge_bases() walk.
This works just fine as long as the caller is checking only two commits, but when there are multiple, there is a possibility that this heuristic is very wrong.

Some code moves since then has changed this method to repo_in_merge_bases_many() inside commit-reach.c. The heuristic computes the minimum generation number of the "reference" list, then compares this number to the generation number of the "commit".

In a recent topic, a test was added that used in_merge_bases_many() to test if a commit was reachable from a number of commits pulled from a reflog. However, this highlighted the problem: if any of the reference commits have a smaller generation number than the given commit, then the walk is skipped _even if there exist some with higher generation number_.

This heuristic is wrong! It must check the MAXIMUM generation number of the reference commits, not the MINIMUM.

The fix itself is to swap min_generation with a max_generation in repo_in_merge_bases_many().


Before Git 2.32 hopefullu (Q1 2021), when certain features (e.g. grafts) used in the repository are incompatible with the use of the commit-graph, we used to silently turned commit-graph off; we now tell the user what we are doing.

See commit c85eec7 (11 Feb 2021) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 726b11d, 17 Feb 2021)

That will show what was intended for Git 2.31, but it has been reverted, as it is a bit overzealous in its current form.

commit-graph: when incompatible with graphs, indicate why

Signed-off-by: Johannes Schindelin
Acked-by: Derrick Stolee

When gc.writeCommitGraph = true, it is possible that the commit-graph is still not written: replace objects, grafts and shallow repositories are incompatible with the commit-graph feature.

Under such circumstances, we need to indicate to the user why the commit-graph was not written instead of staying silent about it.

The warnings will be:

repository contains replace objects; skipping commit-graph
repository contains (deprecated) grafts; skipping commit-graph
repository is shallow; skipping commit-graph
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • See also https://github.com/git/git/commit/091f4cf3586957c3fd99d4c4c59c569d009137ad from https://github.com/git/git/commit/ca676b9bd354e846ac207e7879760719826517ce – VonC Sep 05 '18 at 11:44