\H
matches anything which aren't horizontal whitespace. This includes tab character and all "space separator" Unicode characters. This is the same as:
[^\h] or
[^\t\p{Zs}]
\V
is the negated class of \v
- It is named "non vertical whitespace character" and matches any characters which aren't a vertical whitespace character of those which are treated as line breaks in the Unicode standard and would be matched by \v
, and is the same as the following as introduced in Perl 5:
[^\v] or
[^\n\cK\f\r\x85\x{2028}\x{2029}]
\N
matches any characters which aren't the line feed character \n
. Simple!
[^\n]
What's the difference between \V+
and \N+
?Thanks to Avinash Raj for asking.
As Perl 5.10 specified in the documentation, \V
is the same as [^\n\cK\f\r\x85\x{2028}\x{2029}]
and shouldn't match any of \n
, \r
or \f
, as well as Ctrl+(Control char)
(*nix), 0x85
, 0x2028
and 0x2029
.
These character classes are handy and incredibly effective for when you want to match everything within the horizontal text - \V+
- or simply consuming an entire paragraph - \N+
- among various other use cases.
The following implementations supports \H
, \V
and \N
:
- Perl 5.10
- PCRE 7.2
- PHP programmers may find a discrepancy over which versions supports these constructs. As they came from Perl 5, one has to set the PCRE version instead; You can check this using
phpinfo()
. By default, PHP 5.2.2 does.
- Java 8
java.util.regex.Pattern
support for \H
and \V
constructs has been added as part of implementing \h
, \v
, which was not true for Java 7, however \N
is not yet supported. Tested with JDK8u25.