3

I can't seem to figure out the right syntax but I want a Perl regular expression to find where there are two or more line breaks in a row and condense them into just 2 line breaks.

Here is what I'm using today which doesn't seem to work:

$string =~ s/\n\n+/\n\n/g;

Please let me know what I'm doing wrong and the correct Perl regex I should be using.

Thanks in advance for your help!

Russell C.
  • 1,589
  • 6
  • 30
  • 55
  • 2
    The regular expression you wrote is correct (modulo possible line ending issues), but the catch is how you call it. Are you slurping in the whole file at once? Or are you going line by line? If the latter then your problem is that you don't ever have all of the lines in a string at once. – btilly Feb 05 '11 at 17:58
  • @btilly - I'm giving it the whole file. I've even checked to make sure there aren't other characters in between the line breaks by doing a search and replace and I've confirmed it isn't working. Any other possibilities you can think of? – Russell C. Feb 05 '11 at 18:15
  • What platform are you on – justintime Feb 05 '11 at 18:19
  • 1
    If on Windows and using binary mode, you will get `\r` before every `\n`. You need to account for them if they are there. Also there might be some whitespace on empty lines. – bvr Feb 05 '11 at 18:30
  • 1
    @justintime: If you use `\R` (shortcut for `(?>\r\n|[\r\n])`) the platform is irrelevant. – the wolf Feb 06 '11 at 03:57

4 Answers4

7

If you're using Perl 5.10 or later, try this:

$string =~ s/(\R)(?:\h*\R)+/$1$1/g;

\R is the generic line-separator escape sequence (ref), and \h matches any horizontal whitespace character (e.g. space and TAB) (ref). So this will convert any sequence of one or more blank lines to one empty line.

Most applications these days are liberal in what they'll recognize as a line separator; they'll even accept a mix of two or more styles of separator in the same document. On the other hand, some apps actively convert all line separators to one preferred style. But sometimes you do have to stick to one particular style; that's why I captured the first \R match and used it as the replacement, instead of arbitrarily using \n.

Be aware that these special escape sequences aren't widely supported in other regex flavors. They work in recent versions of PHP, and \R seems to work in Ruby 2.0, though I can't find any doc that mentions it. Ruby 1.9.2 and 2.0 support a \h escape sequence, but it matches a hexadecimal digit ([0-9a-fA-F]), not horizontal whitespace. In most other flavors, \R and \h will either throw an exception or match a literal R and h respectively.

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
  • This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Escape Sequences". – aliteralmind Apr 10 '14 at 02:28
2

This does it:

#!/usr/bin/env perl
use strict;
use warnings;
my $string;
{
   local $/=undef;
   $string =<DATA>;
} 
print "Before:\n$string\n============";

$string=~s/\n{2,}/\n\n/g;
print "After:\n$string\n\nBye Bye!";

__DATA__
Line 1
Line 2






Line 9
Line 10

Line 12



Line 16


Line 19

Output:

Before:
Line 1
Line 2






Line 9
Line 10

Line 12



Line 16


Line 19
============After:
Line 1
Line 2

Line 9
Line 10

Line 12

Line 16

Line 19

Perl also supports the \R character class for platform independence. See this SO link. Your regex would then be s/\R{2,}/\n\n/g;

Community
  • 1
  • 1
the wolf
  • 29,808
  • 12
  • 50
  • 71
0

Show a full example. What is $string?

$ perl -E'my $s = qq{a\n\n\nb}; say "[$s]"; $s =~ s/\n\n+/\n\n/g; say "[$s]"'
[a


b]
[a

b]
oeuftete
  • 2,318
  • 1
  • 22
  • 30
0

@btilly hit the nail on the head. I did a quick test case:

in:

a

b




c

with this code:

my $line = join '', <>;
$line =~ s{\n\n+}{\n\n}g;
print $line;

and it returned the expected result:

a

b

c

You can get the same result by changing the record separator (and avoiding the regex):

{
    # change the Record Separator from "\n" to ""
    # treats multiple newlines as just one (perldoc perlvar)
    # local limits the change to the global $/ to this block
    local $/ = "";
    print <>;
}