2

Suppose I have a giant repo for an as-of-yet unpublished software product called "Hammerstein", written by the famous German software company "Apfel" of which I am an employee.

One day, "Apfel" spins out the Hammerstein division and sells it to the even more famous company "Oráculo" which renames "Hammerstein" to "Reineta" as a matter of national pride and decides to open source it.

Agreements mandate that all references to "Hammerstein" and "Apfel" be replaced by "Oráculo" and "Reineta" in the repository.

All filenames, all commit messages, everything must be replaced.

So, for example:

  1. src/core/ApfelCore/main.cpp must become src/core/OraculoCore/main.cpp.

  2. The commit message that says "Add support for Apfel Groupware Server" must become "Add support for Oraculo Groupware Server"

  3. The strings ApfelServerInstance* local_apfel, #define REINETA and Url("http://apfel.de") must become OraculoServerInstance* local_oraculo, #define HAMMERSTEIN, etc.

This applies to files that are not in HEAD anymore as well.

What is the simplest and most pain-free method to achieve it with minimal manual intervention (so that it can be applied in batch to a potentially large number of repositories/assets)?

  1. BFG can replace the strings, but it seems to only have a --delete-file option, not a --rename-file, and even then it does not take patterns as an argument
  2. This approach seems to work only for HEAD and not for the whole history; I have had no luck using it with --tree-filter
Community
  • 1
  • 1
Tobia Tesan
  • 1,888
  • 13
  • 27

2 Answers2

4

Full disclosure: I'm the author of the BFG Repo-Cleaner

As you say in the question, the BFG supports replacing file content with the --replace-text flag - but this flag does not extend to file names and commit messages. So, what alterations to the codebase would it take to make the BFG's --replace-text operation extend to those too?

This comes down to hooking in some new Cleaner[V] implementations, where V is the type of thing you want to clean (a commit message, a directory listing), and the Cleaner just has the job of producing a new, clean V from an old, dirty V. To perform the actual text change, you can re-use the same text-replacing function used for file content changes.

File Names

Use a Cleaner[Seq[Tree.Entry]] - 'tree' is what Git calls folders ('file tree') - so you would just update the FileName on each Tree.Entry.

Commit Messages

Use a Cleaner[CommitNode] - again, you're just replacing text on the message field - see the CommitMessageObjectIdsUpdater for a very close example for what you're trying to do. While you're there, you could do something with the author and committer email addresses if you wanted to (eg purge ...@apfel.com, I guess).

Speed

As mentioned by @VonC in his answer, filter-branch can do both these replacements (file name & commit message) but while the --msg-filter flag should do the commit message updates reasonably quickly, I believe filter-branch will be fairly excruciatingly slow for renaming files within a large code base like yours. The BFG, optimised for exactly this kind of operation, will be several hundred times faster.

The BFG accepts donations at https://www.bountysource.com/teams/bfg-repo-cleaner - so if you'd like to support development of this feature, or if you just found the BFG useful in solving your problem, that's where you can make a difference.

Roberto Tyley
  • 21,540
  • 9
  • 67
  • 98
  • 1
    There no words to describe how incredibly awesome this answer is. There *are*, in fact, words to describe *why* it is so awesome, but those who can appreciate its awesomeness in full don't need them anyway. I wish I could upvote twice. – Tobia Tesan Sep 21 '15 at 07:28
  • I won't probably need this feature anytime soon since @VonC's method with some additional scripting and `awk`ward `awk`ing did it for me - *but*, if I tried to implement it for the sheer, um, fun, would you consider evaluating a PR for it? – Tobia Tesan Sep 21 '15 at 07:32
  • 1
    'some additional scripting and awkward awking did it for me' - ah, so the filename renaming was not very slow? I thought I saw in your question a mention that this was a very large codebase, but re-reading your question I see you don't specifically mention that. Out of curiosity, how big are these repos? (number of commits, file count in a snapshot of your filetree) - and how long does the filter-branch index-filter run take? – Roberto Tyley Sep 21 '15 at 07:46
  • 1
    Nice! The idea of BFG doing commit message and file names replacement is exiting. – VonC Sep 21 '15 at 07:52
  • 1
    'if I tried to implement it for the sheer, um, fun, would you consider evaluating a PR for it?' - do it for the fun! Exposing this stuff in a nice way in the command line arguments (that makes sense for the entire audience of the BFG) would take a bit more discussion, but if you're willing to play around with Scala, I think you'll find hacking in a rough-and-ready solution quite entertaining. – Roberto Tyley Sep 21 '15 at 07:52
  • @RobertoTyley: *tolerably* slow. We are talking a few thousand commits per repo and maybe ~50KLOC total. My problem was more along the lines of "I have a bunch of biggish repos that other teams have worked on, and I want to replace all instances of the forbidden words in an automagic fashion, without having to know or care too much about the repo contents, all while staying, em, safe from possible repercussions :-)" – Tobia Tesan Sep 21 '15 at 08:21
2

in a single, simple step?

Not single, and not so simple, but possible:


For updating commit message, you would need to use the --msg-filter of git filter-branch

git filter-branch -f --msg-filter 'sed "s/Apfel/Oraculo/"' -- --all

Note the --all, in order to filter in all commits of every branches.
You might have to repeat that command several times for taking care of different case.


For moving files (without using --tree-filter), you can refer to this answer (and this article), and adapt it in order to build the new path:

git filter-branch --index-filter 'git ls-files -s | \
  sed "s,/ApfelCore/,/OraculoCore/," | \
  GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && \
  mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' \
-- --all

As mentioned, BFG can replace all strings; see "How to substitute text from files in git history?"

java -jar bfg.jar --replace-text replacements.txt my-repo.git

The replacements.txt file should contain all the substitutions you want to do, in a format like this (one entry per line)

Apfel==>Oraculo
apfel==>oraculo
REINETA==>HAMMERSTEIN
Community
  • 1
  • 1
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • Thank you very much, this is a very nice starting point that got me out of the pits of despair (I was attempting to use `find ... | xargs replace ...` with `--tree-filter` without much luck), but one would probably want something like `sed -e "s,Apfel,Oraculo," -e "s,Hammerstein,Reineta," -e "s,apfel,oraculo,"` in order to rename *all* relevant files. – Tobia Tesan Sep 20 '15 at 11:31