8

Code:

Arrays.asList("AAAA DDDD, DDDD".split("[\\P{L}]+")).forEach(System.out::println);

Output:

AAAA
DDDD
DDDD

Please notice it's P{L} instead of p{L}(which means letters). I googled it but find nothing. So could any one give me some hint about that?

Sayakiss
  • 6,403
  • 6
  • 48
  • 94
  • See [this question](http://stackoverflow.com/questions/5969440/what-is-the-l-unicode-category) for a link to the site where this topic is described thoroughly. – Wiktor Stribiżew Mar 30 '16 at 15:00

2 Answers2

13

You can find the explanation in Pattern Javadoc:

Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.

So it's the opposite of \p.

Tunaki
  • 116,530
  • 39
  • 281
  • 370
4

Simple: it's the opposite of \\p{L}.

Essentially all "non-letters".

I couldn't find an exact reference in the API, but you can infer the suggestion from the behavior or, say, \\s vs \\S (which is documented there).

Edit (credit to Tunaki for having eyes)

This is actually suggested by the following statement in the documentation:

Unicode blocks and categories are written with the \p and \P constructs as in Perl.

Community
  • 1
  • 1
Mena
  • 45,491
  • 11
  • 81
  • 98
  • Is there any doc or reference about that? – Sayakiss Mar 30 '16 at 14:53
  • @Sayakiss Tons of this all over the Internet, actually. http://www.regular-expressions.info/unicode.html: *You can match a single character belonging to the "letter" category with `\p{L}`. You can match a single character not belonging to that category with `\P{L}`.* – Wiktor Stribiżew Mar 30 '16 at 14:54