1

I need to refactor a file by replacing a message codes, with updated values. My original file is present in Ubuntu server, which I can mount and access as Windows files. I clone it through git in Ubuntu Server, then moved the file to Windows and in Windows, with help of small Java program refactor the value and write it. Then open the file in windows and the file to copy paste the file in Ubuntu server (since copy replace, or move replace of the file show git diff as all contents get changed).

The following is the Java code I have used to do the refactoring.

        ruleInBR = new BufferedReader(new FileReader(ruleIn));
        ruleOutBW = new BufferedWriter(new FileWriter(ruleOut));
        csvOutBW = new BufferedWriter(new FileWriter(csvOut));

        String readRule = "";
        int lineNo = 1;

        while((readRule = ruleInBR.readLine()) != null)
        {
            if(details.get(lineNo) != null)
            {
                AlterValuePair<String> avPair = details.get(lineNo);
                String renamedRule = readRule.replace(avPair.getOldValue(),avPair.getNewValue());
                String trimRenamedRule = renamedRule.replace("\r","");
                csvOutBW.write(lineNo + ", " + avPair.getOldValue() + ", " + avPair.getNewValue() +"\n");
                ruleOutBW.write(trimRenamedRule + "\n");
                count++;
            }
            else {
                String trimReadRule = readRule.replace("\r","");
                ruleOutBW.write(trimReadRule +"\n");

            }
            lineNo++;
        }

But in the GitDiff I come across the issues of presence of git diff for ‘^M’ or ‘\r’ which I actually not did and from my knowledge I know it is because I have opened and worked with some editors which leave those line ending. Since the file refactor will cause issues while compiling in Ubuntu due to the unexpected character. I have followed the following approaches which I learned previously and found in Stack Overflow.

I have adapted following choices in vim

  1. set ff=unix /set fileformat=unix
  2. set ff=dos /set fileformat=dos
  3. %s/\r\n/\n/ or %s/\r// or %s/\r//g
  4. dos2unix fileName
  5. perl -pi -e 's/\r//' or perl -pi -e 's/\r\n/\n/'

But all this cases it altered the whole file as new file and in git diff it shows all are new changes and old ones which I haven't changed are altered. Are there any ways to solve this problem?

I have go through following questions from Stack Overflow:

  1. gVim showing carriage return (^M) even when file mode is explicitly DOS
  2. Convert DOS line endings to Linux line endings in vim
  3. ^M at the end of every line in vim
  4. Remove a line in text file with java.BufferedReader
  5. https://its.ucsc.edu/unix-timeshare/tutorials/clean-ctrl-m.html
  6. https://www.garron.me/en/bits/get-rid-m-characters-vim.html

But none of them helped me in a positive way.

UPDATE

Finally after following an instruction from another Stack Overflow question which talking about omitting in the commit level, which will skip adding whitespaces and that solved my issues to a bit, but that also has some flaws because it has some parts of same files which are not committed (actually belongs the ones omitted in whitespace).

I don’t know actually how to handle this as I have to make changes to several branches and all this may or may not have to go through this problem. Are there any simple way rather than doing at git commit level. Where I have to ignore white space and commit and stash the uncommitted changes every time I am committing like this.

Btw this that Stack Overflow link : Add only non-whitespace changes

halfer
  • 18,701
  • 13
  • 79
  • 158
Amutheezan
  • 335
  • 1
  • 5
  • 20
  • Please clarify exactly what you did. If I understand correctly: 1) checked out files on Unix, 2) copied to Windows, 3) ran through Java file to change some values, 4) copied the transformed file back to Unix, 5) committed the file. Is this a correct description of your timeline? Or did you not commit, and just looked at the diff from the last commit? – Amadan Oct 19 '18 at 07:13
  • 1
    I commit also, check I have updated the description – Amutheezan Oct 19 '18 at 07:15
  • readLine is not going to read the CR char, why do you want to do this in java? – Scary Wombat Oct 19 '18 at 07:18

1 Answers1

3

So, your timeline, as confirmed in comments, with what was happening:

1) checked out files on Unix. The file has Unix line endings (LF).

2) copied to Windows. The file still has Unix line endings.

3) ran through Java file to change some values. As you read the file, you try to strip CR from it, even though it doesn't contain CR in the first place (only LF); but even if it did contain CR, it wouldn't work because you get the string without line endings, as per BufferedReader.readLine documentation. You write lines to the new file with \n; Java understands \n as "end of line terminator", which makes the Java-on-Windows write the Windows line endings (CR LF) on each line written (in both branches of the if - i.e. both on changed lines and those that you just intend to copy without change). The file now contains Windows (CR LF) endings on all its lines.

4) copied the transformed file back to Unix. The line endings are Windows (CR LF).

5) committed the file. Since you committed the file on Linux, I assume git was not set up to strip them during commit. Thus, the file got committed with each line changed: some lines substantially, but some lines trivially (with just the change of the line terminator).

Now you are in a situation where if you try to get the Unix line terminators back, you are effectively changing the whole file - because every line needs to be changed, even just a little.

Other options:

If you have already pushed the changes, the obvious way would be to git revert this commit (which will also look like changing the entire file, but at least it's kind of clear it's a revert), then either rerun the Java program on the Unix machine, or do dos2unix file after copying back to Unix machine but before committing.

If you have not pushed the changes, you can get away with git reset --hard HEAD^ instead of reverting.

Amadan
  • 169,219
  • 18
  • 195
  • 256
  • 1
    Please read the entirety of the answer, not focus on keywords. Of course it alters the whole file, because the whole file is Windows-dirty - and committed at that. The fact that you committed it is the problem. The only scenario where you don't get the whole file changed in the history is the `reset` option, but it's a bit nuclear - dangerous if you have collaborators that have already pulled your changes. There is no other way to not have every line changed in your history, except rewriting history - because you put it into the history with that commit. – Amadan Oct 19 '18 at 07:35
  • The key point with `dos2unix` is that it would have worked if you did it _before you committed_. It would have stripped CR LF and put LF back in, which would restore the lines without substantial edits back to the state they were in before Windows Java touched them. If you manage to undo the commit, then `dos2unix` and commit, that commit would only have the lines with real changes as changed lines. – Amadan Oct 19 '18 at 07:37
  • 1
    I have go through your answer... BTW I tried all approach from scratch about 2 to 3 days and finally I come to ask in stackoverflow what I am actually messing up... – Amutheezan Oct 19 '18 at 07:47
  • As I said, you messed up when you committed the dirty file. Your choice now: rewrite history to redo it again correctly, or live with all lines already being changed (and then changed again to remove the Windows stain). – Amadan Oct 19 '18 at 07:49
  • 1
    i guess this option works well rather than above strategy, https://stackoverflow.com/questions/3515597/add-only-non-whitespace-changes – Amutheezan Oct 19 '18 at 08:20