17

The following constructs are not well documented, but they do work as of specific versions of PHP onwards; Which are these versions, what are these constructs and which other implementations support this?

  • \H
  • \V
  • \N

This thread is part of The Stack Overflow Regex Reference.

Community
  • 1
  • 1
Unihedron
  • 10,251
  • 13
  • 53
  • 66

1 Answers1

24

\H matches anything which aren't horizontal whitespace. This includes tab character and all "space separator" Unicode characters. This is the same as:

[^\h] or
[^\t\p{Zs}]

\V is the negated class of \v - It is named "non vertical whitespace character" and matches any characters which aren't a vertical whitespace character of those which are treated as line breaks in the Unicode standard and would be matched by \v, and is the same as the following as introduced in Perl 5:

[^\v] or
[^\n\cK\f\r\x85\x{2028}\x{2029}]

\N matches any characters which aren't the line feed character \n. Simple!

[^\n]

What's the difference between \V+ and \N+ ?Thanks to Avinash Raj for asking.

As Perl 5.10 specified in the documentation, \V is the same as [^\n\cK\f\r\x85\x{2028}\x{2029}] and shouldn't match any of \n, \r or \f, as well as Ctrl+(Control char) (*nix), 0x85, 0x2028 and 0x2029.

These character classes are handy and incredibly effective for when you want to match everything within the horizontal text - \V+ - or simply consuming an entire paragraph - \N+ - among various other use cases.


The following implementations supports \H, \V and \N:

  • Perl 5.10
  • PCRE 7.2
  • PHP programmers may find a discrepancy over which versions supports these constructs. As they came from Perl 5, one has to set the PCRE version instead; You can check this using phpinfo(). By default, PHP 5.2.2 does.
  • Java 8 java.util.regex.Pattern support for \H and \V constructs has been added as part of implementing \h, \v, which was not true for Java 7, however \N is not yet supported. Tested with JDK8u25.
Community
  • 1
  • 1
Unihedron
  • 10,251
  • 13
  • 53
  • 66
  • 4
    @AvinashRaj Perl 5.10 and PCRE 7.2 was published with this "new feature" back in 2007 June, where PHP implemented PCRE 7.2 as default support back in 2007 Nov. Java 8 was released with this as "new feature" just back in 2013! – Unihedron Nov 17 '14 at 13:21
  • I believe in Perl 5.10 `\N` construct is named Unicode character. So the error is `\N –  Nov 17 '14 at 16:52
  • @sln Yes, actually `\N` when used on its own or with the asterisk-like quantifiers, it's a negation for `\n`: [Demo](http://regex101.com/r/eW1tU1/1). But `\N{}` named construct isn't supported in Java, and therefore `\N` by itself isn't implemented as well, if I catch your drift? – Unihedron Nov 17 '14 at 16:55
  • I said Perl 5.10. The actual error it throws is `Missing braces on \N{} in regex; marked by –  Nov 17 '14 at 17:03
  • @sln Thanks, verified and I'll update the reference and this thread to reflect that soon. Can't believe I left out Perl for the testing. Thanks for pointing this out! Looks like this behaviour is pcre-dependant. – Unihedron Nov 17 '14 at 17:22
  • @sln I'm confused - It worked when I try to use Perl regex `\N+`. It seems that [it was documented within Perl documentation](http://perldoc.perl.org/perlre.html#Character-Classes-and-other-Special-Escapes) that `\N` is _"Any character but `\n`. Not affected by /s modifier'_ when it's not the character class `\N{}`. That's a different construct, but I'll add it into the answer nonetheless, thanks! I'm not sure how the error came up though. – Unihedron Nov 18 '14 at 11:56
  • The documentation link to perlre you provide is for the _latest_ regex in Perl 5.20.1. I don't know when they changed to the `\N`(?:{..})?` but it wasn't for 5.10. Between 5.10 - 5.20 is a transitory state before Version 6.0 and not in any way stable. They will never get it right because the goal of Ver 6.0 is almost unreachable. Version 5.10 was the last stable in my opinion, as they try to eeek out the very last drops of _construct_ available. It's really a sad state. –  Nov 18 '14 at 16:32