1

Let's say I have a CSS file in my git repo with the following ruleset:

.myClass {
    color: red;
    background: #0f0;
}

Now I've added another ruleset above it, like:

.myOtherClass {
    color: red;
    background: #000;
    font: "Comic Sans", Wingdings, sans-serif;
}

.myClass {
    color: red;
    background: #0f0;
}

When I go to commit that, I'd like it to show that I've added the new ruleset above the existing one. But instead, git seems to think I've changed myClass to myOtherClass, and that below color: red; I've added:

    background: #000;
    font: "Comic Sans", Wingdings, sans-serif;
}

.myClass {
    color: red;

These both are accurate descriptions of the change, but it's much more understandable to say I've added a new ruleset.

Is there some way I can manually tell git that's how I want the change to be described?

nHaskins
  • 685
  • 4
  • 18
  • 2
    Git does not traffic in "changes". It doesn't "think" anything about what you've "changed". It stores _the whole file_. Don't worry be happy. – matt May 20 '21 at 18:10
  • @matt That's not entirely true. By default, each commit is a snapshot of the entire tree. But `git diff` is still going to compute a diff like described in the question. (And at some point the pack files are going to crystalize some notion of diff, though I'm not sure that really affects this question beyond how efficiently the desired display can be calculated). – chepner May 20 '21 at 18:14
  • @nHaskins You can use `git difftool` to choose a particular language-aware tool to compute diffs between two files. – chepner May 20 '21 at 18:15
  • To the best of my knowledge, no. But, maybe use a detailed and descriptive commit message? – UdonN00dle May 20 '21 at 18:27
  • I can't reproduce with your example but this did occur to me in the past. Did you try the `--indent-heuristic` option (or alternatively the `--no-indent-heuristic` option)? Did you play with the different possible values for `--diff-algorithm`? – xhienne May 20 '21 at 18:37
  • @chepner Sure, but just as you say, `git diff` is merely computed. It is of no real importance with regard to how Git is functioning internally. OK, Git does have some "intelligence" in this regard (it knows about function declarations, for example) but that's all cosmetic. – matt May 20 '21 at 20:05
  • Possibly relevant: https://stackoverflow.com/questions/21096188/how-to-apply-diff-rules-of-the-languages-in-gitattributes – chepner May 20 '21 at 20:15

1 Answers1

2

First, I'll mention that using your sample, and Git version 2.27.0, I'm already getting the kind of match you'd like. There are other samples that are more problematic though. For the rest of this answer, I'll assume you have one of those.

You can try using one of Git's other built in diff algorithms. You can also fuss with the heuristics (as xhienne mentioned), but note that these just slide a window around on the diff after computing it: they won't change the set of diff hunks that come out. If none of these work, then, other than adding a new diff algorithm, or modifications to the existing ones, no: there isn't anything else left to do at this point, at least not in Git itself. The git difftool method that chepner mentioned is a way to run some other command, instead of Git's built in diff.

As matt noted, each Git commit really just stores a full snapshot of every file. When you use git diff or any other similar tooling to compare two commits, Git will figure out which files in the two commits are "the same file", yet with different content, and feed them to some diff engine. The diff engine's job is to come up with some set of actions that would modify the left-side file to produce the right-side file (I like to use left and right sides here, rather than old and new, because you can feed the files in backwards to get a "reverse patch").

The existing diff engines know nothing of code. They just match lines. The lines color: red; match each other, and the lines } match each other, and the blank lines match each other.

There are two diff engines inside Git today: the default or myers diff, and the patience diff. Both have slight modifications: minimal takes away a shortcut that was inserted into myers to make it go faster, and histogram modifies patience to include lines that patience discards entirely.

For an overview of patience diff, see What is `git diff --patience` for? The actual algorithm works by stripping out "dummy" lines like blanks lines and close-brace lines that match up too often. The histogram modification doesn't completely strip out these lines, but gives them low weighting in terms of deciding whether to match on them. In theory, histogram should work well on these cases, but there was a bug in Git's implementation for a long time.

torek
  • 330,127
  • 43
  • 437
  • 552