4

I need a regex to match ASCII non-alphanumeric characters. The regex should not match non-ASCII characters. I am using the following:

   "[\\u0000-\\u002f\\u003a-\\u0040\\u005b-\\u0060\\u007b-\\u007f]"

Can I simplify this regex ?

Michael
  • 37,415
  • 63
  • 167
  • 303

3 Answers3

6

Yes you can use a character class intersection. Example:

[\\p{ASCII}&&\\P{Alnum}]

This means: intersection between all ascii characters and all non alphanumeric characters

Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
  • Thanks but I need to match ASCII and _non_-alphanumeric characters. So the `"[\\p{ASCII}&&[^\\p{Alnum}]]"` will probably work. – Michael Aug 17 '14 at 12:59
  • 1
    @Michael `\P` is the opposite of `\p`. – Unihedron Aug 17 '14 at 13:01
  • @Michael However since most of the time you'd favour readability, while this regex is shorter, you will find yourself using negated classes in sets more in the future. aka `[a&&[^\s]]` > `[a&&\S]`. – Unihedron Aug 17 '14 at 13:04
  • Thanks again. You are probably right about using of negated classes. – Michael Aug 17 '14 at 13:07
2

You can use this regex in Java

^(?=[^0-9a-zA-Z]+$)\p{ASCII}+$

OR else:

^(?!\p{Alnum}+$)\p{ASCII}+$
anubhava
  • 664,788
  • 59
  • 469
  • 547
  • Yes it OP uses them in `matches` method then anchors aren't needed. But if it is used in other methods like `find` then probably needed. – anubhava Aug 17 '14 at 12:47
  • The OP was talking about matching a single character, which would be `(?!\p{Alnum})\p{ASCII}`. To enforce that condition on the whole string, you would do this: `^(?:(?!\p{Alnum})\p{ASCII})+$`. Your regexes match a string that's all ASCII but not all alphanumeric (that is, it must contain at least one non-alphanumeric character). – Alan Moore Aug 17 '14 at 15:11
2

You can use a set intersection:

"[\\p{ASCII}&&[^\\p{Alnum}]]"

Read: Reference - What does this regex mean?

Community
  • 1
  • 1
Unihedron
  • 10,251
  • 13
  • 53
  • 66
  • Great ! Thank you. Looks like exactly what I need. Too bad it is not a _standard_ regex. – Michael Aug 17 '14 at 13:01
  • @Michael There is no such thing as a _standard_ regular expression ever since programmers overuse it for everything - Writing the one-liner regex is often more viable than writing an entire finite state machine to parse! – Unihedron Aug 17 '14 at 13:02