17

I have tested \v (vertical white space) for matching \r\n and their combinations, but I found out that \v does not match \r and \n. Below is my code that I am using..

$string = "
Test
";

if (preg_match("#\v+#", $string )) {
  echo "Matched";
} else {
  echo "Not Matched";
}

To be more clear, my question is, is there any other alternative to match \r\n?

brian d foy
  • 121,466
  • 31
  • 192
  • 551
Jason OOO
  • 3,447
  • 2
  • 21
  • 28

7 Answers7

39

PCRE and newlines

PCRE has a superfluity of newline related escape sequences and alternatives.

Well, a nifty escape sequence that you can use here is \R. By default \R will match Unicode newlines sequences, but it can be configured using different alternatives.

To match any Unicode newline sequence that is in the ASCII range.

preg_match('~\R~', $string);

This is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85)

To match any Unicode newline sequence; including newline characters outside the ASCII range and both the line separator (U+2028) and paragraph separator (U+2029), you want to turn on the u (unicode) flag.

preg_match('~\R~u', $string);

The u (unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).

The is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})

It is possible to restrict \R to match CR, LF, or CRLF only:

preg_match('~(*BSR_ANYCRLF)\R~', $string);

The is equivalent to the following group:

(?>\r\n|\n|\r)

Additional

Five different conventions for indicating line breaks in strings are supported:

(*CR)        carriage return
(*LF)        linefeed
(*CRLF)      carriage return, followed by linefeed
(*ANYCRLF)   any of the three above
(*ANY)       all Unicode newline sequences

Note: \R does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.

hwnd
  • 65,661
  • 4
  • 77
  • 114
  • wow! I never used it, that's what I looked for :) see this example: http://phpfiddle.org/main/code/phd-ebj – Jason OOO Sep 25 '13 at 05:38
  • 3
    This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Escape Sequences". – aliteralmind Apr 10 '14 at 01:03
  • 1
    +1 for `\R`. For academic purposes only, if you're not in `u` mode, you can invent this other way of matching `\r` or `\n` without using them: `(?![ \t\cK\f])\s` Why? because `\s` matches `[ \t\cK\f\r\n]`, so this is a form of class subtraction. :) – zx81 Jul 14 '14 at 00:04
  • Be careful. I had problems using the capture group '~\R~' with russian words. When this regex is applied to the word "необходимости" it becomes "необ� одимости". – Pedro Sousa Feb 14 '19 at 08:34
  • @PedroSousa why did you omit the `u` pattern modifier? You need to tell the regex engine when you want to read multibyte characters in the input string. – mickmackusa Apr 29 '21 at 03:51
7

This doesn't answer the question for alternatives, because \v works perfectly well

\v matches any character considered vertical whitespace; this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below.

You only need to change "#\v+#" to either

  • "#\\v+#" escape the backslash

or

  • '#\v+#' use single quotes

In both cases, you will get a match for any combination of \r and \n.

Update:

Just to make the scope of \v clear in comparison to \R, from perlrebackslash

  • \R
    \R matches a generic newline; that is, anything considered a linebreak sequence by Unicode. This includes all characters matched by \v (vertical whitespace), ...
Olaf Dietsche
  • 66,104
  • 6
  • 91
  • 177
6

If there is some strange requirement that prevents you from using a literal [\r\n] in your pattern, you can always use hexadecimal escape sequences instead:

preg_match('#[\xD\xA]+#', $string)

This is pattern is equivalent to [\r\n]+.

p.s.w.g
  • 136,020
  • 27
  • 262
  • 299
1

To match every LINE of a given String, simple use the ^$ Anchors and advice your regex engine to operate in multi-line mode. Then ^$ will match the start and end of each line, instead of the whole strings start and end.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

in PHP, that would be the m modifier after the pattern. /^(.*?)$/m will simple match each line, seperated by any vertical space inside the given string.

Btw: For line-Splitting, you could also use split() and the PHP_EOL constant:

$lines = explode(PHP_EOL, $string);
dognose
  • 18,985
  • 9
  • 54
  • 99
0

The problem is that you need the multiline option, or dotall option if using dot. It goes at the end of the delimiter.

http://www.php.net/manual/en/regexp.reference.internal-options.php

$string = "
Test
";
if(preg_match("#\v+#m", $string ))
echo "Matched";
else
echo "Not Matched";
beiller
  • 3,025
  • 1
  • 9
  • 17
  • this is not makes \v to match \r\n – Jason OOO Sep 24 '13 at 18:09
  • 1
    Multiline mode is irrelevant. Many regex users jump to the conclusion that you have to specify multiline mode whenever the target string contains line separators. All it does is tweak the behavior of the anchors (`^` and `$`), so they'll match at line boundaries (i.e. before and after line separators). The OP's regex doesn't contain any anchors. – Alan Moore Sep 24 '13 at 22:31
0

To match a newline in PHP, use the php constant PHP_EOL. This is crossplatform.

if (preg_match('/\v+' . PHP_EOL ."/", $text, $matches ))
   print_R($matches );
Byron Whitlock
  • 49,611
  • 27
  • 114
  • 164
0

This regex also matches newline \n and carriage return \r characters.

(?![ \t\f])\s

DEMO

To match one or more newline or carriage return characters, you could use the below regex.

(?:(?![ \t\f])\s)+

DEMO

Avinash Raj
  • 160,498
  • 22
  • 182
  • 229