2

I try to get the commits that have been done after a date in a local copy of a git repo and then extract the related modifications on the files.

If I would like to compare this to a git command, It would be :

git log -p --reverse --after="2016-10-01"

Here is the script I use:

require "rugged"
require "date"

git_dir = "../ruby-gnome2/"

repo = Rugged::Repository.new(git_dir)
walker = Rugged::Walker.new(repo)
walker.sorting(Rugged::SORT_DATE| Rugged::SORT_REVERSE)
walker.push(repo.head.target)

walker.each do |commit|
  c_time = Time.at(commit.time)
  next unless c_time >= Date.new(2016,10,01).to_time

    puts c_time
    puts commit.diff.size
    puts commit.diff.stat.inspect
end

The problem is that it looks like a lot of files are modified here is the end of the output of this script:

2016-10-22 17:33:37 +0200
2463
[2463, 0, 271332]

Which means that there are 2463 files modified/deleted/replaced. While a git log -p --reverse --after="2016-10-22" show that only 2 files are modified.

How can I get the same results than with the git command? ie How can I find the real files that are modified by this commit?

cedlemo
  • 2,946
  • 2
  • 25
  • 45

2 Answers2

1

As I didn't have any answer from the rugged team, I have done a ruby gobject-introspection loader for the libgit2-glib here https://github.com/ruby-gnome2/ggit.

Now I can find the diff and the logs that corresponds to the git command line interface:

require "ggit"

PATH = File.expand_path(File.dirname(__FILE__))

repo_path = "#{PATH}/ruby-gnome2/.git"

file = Gio::File.path(repo_path)

begin
  repo = Ggit::Repository.open(file)
  revwalker = Ggit::RevisionWalker.new(repo)
  revwalker.sort_mode = [:time, :topological, :reverse]
  head = repo.head
  revwalker.push(head.target)
rescue => error
  STDERR.puts error.message
  exit 1
end

def signature_to_string(signature)
  name = signature.name
  email = signature.email
  time = signature.time.format("%c")

  "#{name} <#{email}> #{time}"
end

while oid = revwalker.next do
  commit = repo.lookup(oid, Ggit::Commit.gtype)

  author = signature_to_string(commit.author)
  date = commit.committer.time
  next unless (date.year >= 2016 && date.month >= 11 && date.day_of_month > 5)
  committer = signature_to_string(commit.committer)

  subject = commit.subject
  message = commit.message

  puts "SHA: #{oid}"
  puts "Author:  #{author}"
  puts "Committer: #{committer}"
  puts "Subject: #{subject}"
  puts "Message: #{message}"
  puts "----------------------------------------"

  commit_parents = commit.parents
  if commit_parents.size > 0
    parent_commit = commit_parents.get(0)
    commit_tree = commit.tree
    parent_tree = parent_commit.tree

    diff = Ggit::Diff.new(repo, :old_tree => parent_tree,
                          :new_tree => commit_tree, :options => nil)

    diff.print( Ggit::DiffFormatType::PATCH ).each do |_delta, _hunk, line|
      puts "\t | #{line.text}"
      0
    end

  end

end
cedlemo
  • 2,946
  • 2
  • 25
  • 45
0

When I clone ruby-gnome2/ruby-gnome2, it tells me there is 2400+ files, so for you to get 2463, that strikes me as all the files have been modified.

This differs from the normal behavior of a rugged#commit.diff, which diff by default the current commit (returned by the Walker) against the first parent commit.

Check if you have some settings like git config core.autocrlf set to true (which might change eol in your local repo).

VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • Unfortunately, there are no differences, look here https://github.com/ruby-gnome2/ruby-gnome2/commits/master, this is the repo I use for my tests. You can easily see that there are not 2463 files modified/deleted/replaced after the date of "2016-10-22". – cedlemo Nov 05 '16 at 13:12
  • @cedlemo: yes, a `git log --name-status --oneline --after="2016-10-22"` shows only 3 files. That means your script has an issue. Is it because it counts the commit *before* that date, not after? After all, a `git log --name-status --oneline --before="2016-10-22" --all --branches|grep -e "^M"|sort|uniq|wc` gives 3362 files. And the `unless c_time >= Date.new(2016,10,01).to_time` is suspicious... – VonC Nov 05 '16 at 14:37
  • the `unless c_time >= Date.new(2016, 10, 01)` does not really matters because the output I shown is the last iteration of the loop over the commits which means that the last commit has 2463 files modified which is false. The script has maybe an issue but I don't see where. – cedlemo Nov 05 '16 at 15:35
  • @cedlemo The script might be fine, but the repo itself might not. See my revised answer. – VonC Nov 06 '16 at 09:55
  • I tried your command `git config core.autocrlf true` but it changes nothing. I will post an issue on the rugged github because I have seen that someone already complained the `Rugged::Commit#diff`. I shall keep you informed. – cedlemo Nov 06 '16 at 15:34
  • @cedlemo Actually, to avoid automatic changes, try instead: `git config --global core.autocrlf false`, and clone the repo again, then apply your script to the newly cloned repo. – VonC Nov 06 '16 at 15:39
  • I did your command `git config --global core.autocrlf false`, then cloned the repo and the script which I have modified in order to use the new local clone shows the exact same output than previously. I tried the same with `git config --global core.autocrlf input` which is for linux (https://help.github.com/articles/dealing-with-line-endings/) but without success neither. – cedlemo Nov 06 '16 at 18:18
  • @cedlemo OK. I will monitor https://github.com/libgit2/rugged/issues then – VonC Nov 06 '16 at 18:22