0

Hi I want to remove extra blank lines in my source text file (which means if there are 2 or more blank lines only keep 1 blank line). I used this pattern:

^(\s*(\n|\r|\r\n)){2,}

It cannot handle empty line at end of file, like this:

1. BlablablaCRLF
2. CRLF
3. 

above (line 3) is the end of file, VS StyleCop complains that there are multiple blank lines here. it looks like a newline at end of file but actually nothing there, I turned on "Show all character" in notepad++, I was expecting to see a CRLF at end of file however it didn't. My pattern cannot identify this, how to handle this case? Thanks!

codewarrior
  • 621
  • 6
  • 17
  • Your regex says that match at least `\s`(if any) along with any of `\n`,`\r` or `\r\n` at least twice – rock321987 Apr 17 '16 at 07:30
  • I think that if you use stylecop through resharper there is an automatic fix for that particular violation, so you can fix your entire solution without needing to roll your own regex. (might be easier to write a tiny console app to do it anyway). – satnhak Apr 18 '16 at 01:26

1 Answers1

1

Basic Answer

If this is what you want to match:

  1. Multiple continuous empty lines where multiple means > 1.
  2. All empty lines at the end of a file except the one implicitely generated by \n-terminating the file (which can be considered as a good practice, see here).
  3. All redundant whitespaces after the terminating \n.

Then this pattern might help you:

(^\s*(\r|\n)){2,}|^\s+(\r|\n)?\Z

Further Explanation

The first part (^\s*(\r|\n)){2,} takes care of 1., the second part ^\s+(\r|\n)?\Z matches redundant empty lines at the end of a file or redundant whitespaces following the terminating \n.

If your file looks like this (with Unix file endings) ...

1. FirstLine\n
2. 
3. ThirdLine\n
4. FourthLine\n
5.
6.
7. SeventhLine\n

... then it only matches lines 5 and 6, but nothing at the end. Notepad++ though will show an 8th line at the end due to the terminating \n. However, if there would be multiple \ns at the end of the file or if there would be additional \t or spaces after the terminating \n in the 7th line, theese would match.

If you also want to match the line generated by the \n termination (and as a result remove the \n termination when replacing), you could as well use ^\s*\Z instead for the second part of the regular expression.

Additional explanation of \s*(\r\n): This matches every allowed combination like abc\n, abc\r\n or abc\r because \s also includes \n and \r.

\Z matches the end of the whole file/input (whereas $ would only match a line's end).

I'm sure there might be a shorter version of the regular expression but my first intention was to make it work and understandable.

mxscho
  • 1,586
  • 1
  • 12
  • 25
  • Please take a look at my edit for a solution for processing single lines (probably solving the "at least 2 matches" issue). :) – mxscho Apr 17 '16 at 07:37
  • Thanks, but how to do if I want to keep one blank line? If \n appears at least 2 times, replace them with "\n"; if \n appears at end of file, delete that \n (replace with empty string), how to express this in regex? Thanks! @mxscho – codewarrior Apr 17 '16 at 08:39
  • That obviously changes the question - at least I wasn't aware of what you actually wanted to achieve at first. I took some time and reworked my answer completely, so hopyfully this might help you much more now. – mxscho Apr 17 '16 at 22:13