22

I have an existing repository where line endings are all messed up. I'd like to rewrite the entire repository and fix line endings once and for all. There are text files and binary files, let's assume that git's heuristics for detecting binary files will work just fine.

What's the easiest way to repopulate the entire repository with files with normalized line endings?

user907059
  • 231
  • 1
  • 2
  • 4
  • 1
    I thought all the options were already presented in http://stackoverflow.com/questions/1011985/line-endings-messed-up-in-git-how-to-track-changes-from-another-branch-after-a/1060828#1060828? – VonC Aug 23 '11 at 05:43
  • tree-filter is prohibitively slow for me. Even cherry-picking changes one by one is faster. – user907059 Aug 23 '11 at 05:58
  • 1
    slow? But it is a one-time operation you wouldn't repeat everyday. Launch it one evening, get back the next morning. Wouldn't that be possible in your case? – VonC Aug 23 '11 at 06:00
  • Turns out it was slow because it was Cygwin. It's just fine in Linux. – user907059 Sep 17 '11 at 04:11
  • With Git 2.16 (Q1 2018), you will have `git add --renormalize .`: See [my answer below](https://stackoverflow.com/a/47580886/6309) – VonC Nov 30 '17 at 19:07

3 Answers3

34

Since Git 2.16 (Q1 2018) there is another way (other than deleting the index content), with "git add --renormalize .", which is a new and safer way to record the fact that you are correcting the end-of-line convention

See commit 9472935 (16 Nov 2017) by Torsten Bögershausen (tboegi).
(Merged by Junio C Hamano -- gitster -- in commit af6e0fe, 27 Nov 2017)

add: introduce "--renormalize"

Make it safer to normalize the line endings in a repository.
Files that had been committed with CRLF will be committed with LF.

The old way to normalize a repo was like this:

# Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on.

The new "add --renormalize" does not add untracked files:

$ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for "git add -u --renormalize <pathspec>".


Note: Git 2.21 (Feb. 2019) fixes a bug related to this: "git add --ignore-errors" did not work as advertised and instead worked as an unintended synonym for "git add --renormalize", which has been fixed.

See commit 9e5da3d (17 Jan 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 1c41824, 05 Feb 2019)

add: use separate ADD_CACHE_RENORMALIZE flag

Commit 9472935 (add: introduce "--renormalize", 2017-11-16, Git 2.16) taught git add to pass HASH_RENORMALIZE to add_to_index(), which then passes the flag along to index_path().
However, the flags taken by add_to_index() and the ones taken by index_path() are distinct namespaces.
We cannot take HASH_* flags in add_to_index(), because they overlap with the ADD_CACHE_* flags we already take (in this case, HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS).

We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using it to set HASH_RENORMALIZE within add_to_index().
In order to make it clear that these two flags come from distinct sets, let's also change the name "newflags" in the function to "hash_flags".

Also: See commit e2c2a37 (07 Feb 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 9293bf6, 07 Feb 2019)

add_to_index(): convert forgotten HASH_RENORMALIZE check

Commit 9e5da3d (add: use separate ADD_CACHE_RENORMALIZE flag, 2019-01-17) switched out using HASH_RENORMALIZE in our flags field for a new ADD_CACHE_RENORMALIZE flag.
However, it forgot to convert one of the checks for HASH_RENORMALIZE into the new flag, which totally broke "git add --renormalize".

VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • Ubuntu 17.10 still has git 2.14. This answer is ahead of its time :) – Pierre.Sassoulas Feb 17 '18 at 13:59
  • 2
    @Pierre.Sassoulas Yet, you can upgrade Git at any time: http://lifeonubuntu.com/upgrading-ubuntu-to-use-the-latest-git-version/ – VonC Feb 17 '18 at 15:25
  • Thanks, I came here because I read the doc and my version did not cooperate. Problem fixed. – Pierre.Sassoulas Feb 17 '18 at 15:50
  • Thanks. BTW, I prefer to leave files as they are without any automatic conversion (e.g. to avoid converting sh scripts to crlf on Windows or cmd scripts to lf on Linux) by specifying `* -text` in my gitattributes and then deleting and readding the files back with final correct end-of-lines that shouldn't be ever touched by git. – JustAMartin Jun 28 '19 at 11:47
  • @JustAMartin Agreed, but shouldn't `git add --renormalize` avoid the delete/add again step? – VonC Jun 28 '19 at 12:09
  • Maybe `renormalize` should work on new git versions also for "denornalization" (updating the git stage with file EOLs as they are in current working copy), but I haven't yet had a chance to try it - will keep in mind for the next time. – JustAMartin Jun 28 '19 at 12:37
3

If you just want to renormalize your current commit after having set core.autocrlf or text=auto, so you can have all the line ending normalization in one commit, run these commands:

git rm --cached -rf .
git add .

To also normalize the files in your working dir, run:

git checkout .
Chronial
  • 55,303
  • 13
  • 76
  • 85
1

This can be used without git. Then, later on, git commit the code base.

for f in $(find ./ -type f ) ; do
    if grep -qP '\x00' $f ; then
       # file is binary
       continue    
    fi

    perl -pe 'BEGIN{ undef $/} s/\x0d\x0a/\x0a/g;s/\x0d/\x0a/g' -i $f
done

The grep is assuming anything containing a null character is a binary file.

perl is used to edit each file in-place. First, Windows style newlines are changed to Unix style newlines. Then Mac style newlines are changed to Unix style newlines.

Shizzmo
  • 14,677
  • 3
  • 20
  • 15