What does regex pattern "[\\P{L}]+" mean in Java?

Question

Code:

Arrays.asList("AAAA DDDD, DDDD".split("[\\P{L}]+")).forEach(System.out::println);

Output:

AAAA
DDDD
DDDD

Please notice it's P{L} instead of p{L}(which means letters). I googled it but find nothing. So could any one give me some hint about that?

See [this question](http://stackoverflow.com/questions/5969440/what-is-the-l-unicode-category) for a link to the site where this topic is described thoroughly. — Wiktor Stribiżew, Mar 30 '16 at 15:00

score 13 · Accepted Answer · answered Mar 30 '16 at 14:55

You can find the explanation in Pattern Javadoc:

Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.

So it's the opposite of \p.

score 4 · Answer 2 · edited May 23 '17 at 11:50

4

Simple: it's the opposite of \\p{L}.

Essentially all "non-letters".

I couldn't find an exact reference in the API, but you can infer the suggestion from the behavior or, say, \\s vs \\S (which is documented there).

Edit (credit to Tunaki for having eyes)

This is actually suggested by the following statement in the documentation:

Unicode blocks and categories are written with the \p and \P constructs as in Perl.

edited May 23 '17 at 11:50

Community

1
1

answered Mar 30 '16 at 14:53

Mena

45,491
11
81
98

Is there any doc or reference about that? – Sayakiss Mar 30 '16 at 14:53
@Sayakiss Tons of this all over the Internet, actually. http://www.regular-expressions.info/unicode.html: *You can match a single character belonging to the "letter" category with `\p{L}`. You can match a single character not belonging to that category with `\P{L}`.* – Wiktor Stribiżew Mar 30 '16 at 14:54

What does regex pattern "[\\P{L}]+" mean in Java?

2 Answers2