0

I am trying to find the Java equivalents of the Perl regular expressions not supported by Java. These are, as listed in the Java Documentation:

\h    A horizontal whitespace
\H    A non horizontal whitespace
\v    A vertical whitespace
\V    A non vertical whitespace
\R    Any Unicode linebreak sequence \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
\X    Match Unicode extended grapheme cluster

However, I have no idea what any of these mean, which makes it rather challenging to recreate them.

(I think) I know what whitespace is, but I did not know there were multiple kinds of whitespace. What is the difference between horizontal, non-horizontal, vertical, and non-vertical whitespace?

Evorlor
  • 6,158
  • 16
  • 60
  • 122
  • 3
    Unicode is **huge**. It contains not only 26 different space characters, but also such things as a special Arabic star (included at the insistence of Arabs) that cannot be confused with the Star of David, and a post-combining dot-hardener character. I am not making any of this up. You absolutely have to look up one of the introductions to Unicode hanging around on the net if you want to understand what these definitions are about. – Kilian Foth Mar 26 '15 at 15:38
  • I would love to know what type of whitespace is both non-horizontal and non-vertical, given that there are no diagonal writing methods that I know of. –  Mar 26 '15 at 15:50
  • 2
    The Perl regex documentation [perlre](http://perldoc.perl.org/perlre.html#Regular-Expressions) refers to the [perlrecharclass](http://perldoc.perl.org/perlrecharclass.html#Backslash-sequences) page that lists the meaning of these character classes in a table. Note that `\X` and `\R` are not real charclasses since they match multiple code points – see [Unicode Technical Standard #18: Regexes](http://www.unicode.org/reports/tr18/) for details. Perl implements `\h` and `\v` through the non-standard Unicode properties `HorizSpace` and `VertSpace`. – amon Mar 26 '15 at 16:25

0 Answers0