32

I am trying to produce a list of the files that were changed in a specific commit. The problem is, that every file has the version number in a comment at the top of the file - and since this commit introduces a new version, that means that every file has changed.

I don't care about the changed comments, so I would like to have git diff ignore all lines that match ^\s*\*.*$, as these are all comments (part of /* */).

I cannot find any way to tell git diff to ignore specific lines.

I have already tried setting a textconv attribute to cause Git to pass the files to sed before diffing them, so that sed can strip out the offending lines - the problem with this, is that git diff --name-status does not actually diff the files, just compares the hashes, and of course all the hashes have changed.

Is there a way to do this?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Benubird
  • 15,843
  • 24
  • 83
  • 128
  • A wild guess... Did you try `git diff --name-status --textconv`? Or maybe `git diff --name-only`? – rodrigo May 13 '13 at 16:55
  • Yes, I am using --name-only, but it returns (like I said), every file, because every files has had its comments changed. --textconv does not work, because, as I also said in the post, git ignores it when not producing a full diff. – Benubird May 13 '13 at 17:04
  • possible duplicate of [ignoring changes matching a string in git diff](http://stackoverflow.com/questions/15878622/ignoring-changes-matching-a-string-in-git-diff) – richvdh Jun 23 '15 at 14:08
  • 1
    @richvdh I think the questions are similar enough to be considered a duplicate, BUT they have different correct answers, and this question has additional answers making suggestions that the other Q does not have, so I believe there is value in keeping both of them. – Benubird Jun 23 '15 at 15:23
  • Related: Git 2.30 (Q1 2021) will propose [`git diff -I`](https://stackoverflow.com/a/64758633/6309). – VonC Nov 09 '20 at 20:16

7 Answers7

21

Here is a solution that is working well for me. I've written up the solution and some additional missing documentation on the git (log|diff) -G<regex> option.

It is basically using the same solution as in previous answers, but specifically for comments that start with a * or a #, and sometimes a space before the *... But it still needs to allow #ifdef, #include, etc. changes.

Look ahead and look behind do not seem to be supported by the -G option, nor does the ? in general, and I have had problems with using *, too. + seems to be working well, though.

(Note, tested on Git v2.7.0)

Multi-Line Comment Version

git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])'
  • -w ignore whitespace
  • -G only show diff lines that match the following regex
  • (^[^\*# /]) any line that does not start with a star or a hash or a space
  • (^#\w) any line that starts with # followed by a letter
  • (^\s+[^\*#/]) any line that starts with some whitespace followed by a comment character

Basically an SVN hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against SVN without the FYI information that SVN drops in the comments.

Technically this will allow for Python and Bash comments like #TODO to be shown in the diff, and if a division operator started on a new line in C++ it could be ignored:

a = b
    / c;

Also the documentation on -G in Git seemed pretty lacking, so the information here should help:

git diff -G<regex>

-G<regex>

Look for differences whose patch text contains added/removed lines that match <regex>.

To illustrate the difference between -S<regex> --pickaxe-regex and -G<regex>, consider a commit with the following diff in the same file:

+    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
...
-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);

While git log -G"regexec\(regexp" will show this commit, git log -S"regexec\(regexp" --pickaxe-regex will not (because the number of occurrences of that string did not change).

See the pickaxe entry in gitdiffcore(7) for more information.

(Note, tested on Git v2.7.0)

  • -G uses a basic regular expression.
  • No support for ?, *, !, {, } regular expression syntax.
  • Grouping with () and OR-ing groups works with |.
  • Wild card characters such as \s, \W, etc. are supported.
  • Look-ahead and look-behind are not supported.
  • Beginning and ending line anchors ^$ work.
  • Feature has been available since Git 1.7.4.

Excluded Files v Excluded Diffs

Note that the -G option filters the files that will be diffed.

But if a file gets "diffed" those lines that were "excluded/included" before will all be shown in the diff.

Examples

Only show file differences with at least one line that mentions foo.

git diff -G'foo'

Show file differences for everything except lines that start with a #

git diff -G'^[^#]'

Show files that have differences mentioning FIXME or TODO

git diff -G`(FIXME)|(TODO)`

See also git log -G, git grep, git log -S, --pickaxe-regex, and --pickaxe-all

UPDATE: Which regular expression tool is in use by the -G option?

https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=

https://github.com/git/git/blob/master/diffcore-pickaxe.c

if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
    int cflags = REG_EXTENDED | REG_NEWLINE;
    if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
        cflags |= REG_ICASE;
    regcomp_or_die(&regex, needle, cflags);
    regexp = &regex;

// and in the regcom_or_die function
regcomp(regex, needle, cflags);

http://man7.org/linux/man-pages/man3/regexec.3.html

   REG_EXTENDED
          Use POSIX Extended Regular Expression syntax when interpreting
          regex.  If not set, POSIX Basic Regular Expression syntax is
          used.

// ...

   REG_NEWLINE
          Match-any-character operators don't match a newline.

          A nonmatching list ([^...])  not containing a newline does not
          match a newline.

          Match-beginning-of-line operator (^) matches the empty string
          immediately after a newline, regardless of whether eflags, the
          execution flags of regexec(), contains REG_NOTBOL.

          Match-end-of-line operator ($) matches the empty string
          immediately before a newline, regardless of whether eflags
          contains REG_NOTEOL.
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
phyatt
  • 16,890
  • 3
  • 51
  • 70
  • It looks like it is similar to "Simple Regular Expression". https://en.wikibooks.org/wiki/Regular_Expressions/Simple_Regular_Expressions – phyatt Apr 24 '17 at 12:52
  • That couldn't be completely right since it accepts some non-simple syntax such as `+` (I just tested). – Emadpres Apr 24 '17 at 13:18
  • See update near the end of my answer. I haven't successfully tested the "POSIX extended regular expressions". My empirical testing showed it not working quite the same. – phyatt Apr 24 '17 at 13:51
  • @phyatt - this does not sem to work: `git diff -G'^[^#]'`. It still shows lines starting with `#`. – Martin Vegter Aug 24 '19 at 05:54
  • @MartinVegter The syntax will still show up if the file has at least one other difference. If a file only has comment differences, the file will be excluded in the results. – phyatt Aug 24 '19 at 12:17
14
git diff -G <regex>

And specify a regular expression that does not match your version number line.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
riezebosch
  • 1,796
  • 15
  • 28
10

I found it easiest to use git difftool to launch an external diff tool:

git difftool -y -x "diff -I '<regex>'"
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
richvdh
  • 1,050
  • 11
  • 18
4

I found a solution. I can use this command:

git diff --numstat --minimal <commit> <commit> | sed '/^[1-]\s\+[1-]\s\+.*/d'

To show the files that have more than one line changed between commits, which eliminates files whose only change was the version number in the comments.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Benubird
  • 15,843
  • 24
  • 83
  • 128
2

Using 'grep' on the 'git diff' output,

git diff -w | grep -c -E "(^[+-]\s*(\/)?\*)|(^[+-]\s*\/\/)"

comment line changes alone can be calculated. (A)

Using 'git diff --stat' output,

git diff -w --stat

all line changes can be calculated. (B)

To get non comment source line changes (NCSL) count, subtract (A) from (B).

Explanation:

In the 'git diff ' output (in which whitespace changes are ignored),

  • Look out for a line which start with either '+' or '-', which means modified line.
  • There can be optional white-space characters following this. '\s*'
  • Then look for comment line pattern '/*' (or) just '*' (or) '//'.
  • Since, '-c' option is given with grep, just print the count. Remove '-c' option to see the comments alone in the diffs.

NOTE: There can be minor errors in the comment line count due to following assumptions, and the result should be taken as a ballpark figure.

  • 1.) Source files are based on the C language. Makefile and shell script files have a different convention, '#', to denote the comment lines and if they are part of diffset, their comment lines won't be counted.

  • 2.) The Git convention of line change: If a line is modified, Git sees it as that particular line is deleted and a new line is inserted there and it may look like two lines are changed whereas in reality one line is modified.

     In the below example, the new definition of 'FOO' looks like a two-line change.
    
     $  git diff --stat -w abc.h
     ...
     -#define FOO 7
     +#define FOO 105
     ...
     1 files changed, 1 insertions(+), 1 deletions(-)
     $
    
  • 3.) Valid comment lines not matching the pattern (or) Valid source code lines matching the pattern can cause errors in the calculation.

In the below example, the "+ blah blah" line which doesn't start with '*' won't be detected as a comment line.

           + /*
           +  blah blah
           + *
           + */

In the below example, the "+ *ptr" line will be counted as a comment line as it starts with *, though it is a valid source code line.

            + printf("\n %p",
            +         *ptr);
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
0

Perhaps a Bash script like this:

#!/bin/bash
git diff --name-only "$@" | while read FPATH ; do
    LINES_COUNT=`git diff --textconv "$FPATH" "$@" | sed '/^[1-]\s\+[1-]\s\+.*/d' | wc -l`
    if [ $LINES_COUNT -gt 0 ] ; then
        echo -e "$LINES_COUNT\t$FPATH"
    fi
done | sort -n
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
saeedgnu
  • 3,434
  • 2
  • 26
  • 41
0

For most languages, to do it correctly, you have to parse the original source file/ast, and exclude comments that way.

One reason is that the start of multi-line comments might not be covered by the diff. Another reason is that language-parsing isn't trivial, and there are often things that can trip up a naive parser.

I was going to do that for python, but string-hacking was good enough for my needs.

For python, you can ignore comments and attempt-to-ignore docstrings using a custom filter, such as this:

https://gist.github.com/earonesty/f76dec337ee64c5ae23c2be1557535a4

That code can be trivially modified to produce filenames, rather than counts.

But it can, of course, mistakenly count parts of docstrings as "code" (which is isn't for things like coverage, etc).

Erik Aronesty
  • 8,927
  • 4
  • 46
  • 29