-2

I have a Java String, and I need to replace all characters that are NOT

  • alphanumeric characters
  • or one of the following (acceptable) characters: -asterisk *,
    • hyphen -,
    • period .,
    • and underscore _

I tried with [^\w*\-_]. What would be a regex I can use to find these characters?

Lani1234
  • 470
  • 5
  • 12
  • 20
  • Can't you just use \W – reggaeguitar Apr 22 '15 at 18:46
  • Have you tried anything like negated character class http://www.regular-expressions.info/charclass.html? How did it not work? – Pshemo Apr 22 '15 at 18:46
  • @Pshemo Thank you for that link. This seems to work from the few tests I've done so far: [^\w\*\-\_] – Lani1234 Apr 22 '15 at 18:55
  • "*I've done so far: `[^\w*\-_]`*" and what is the problem with this regex beside lack of `.` inside it? – Pshemo Apr 22 '15 at 19:00
  • @Pshemo Sorry, I meant that because of your link, I was able to figure out how to negate, so it seems to be working. And you're right, I needed to add the period in. I also took out the _ because it was redundant and is covered in the \w. So now it looks like this: [^\w\*\-\.] – Lani1234 Apr 22 '15 at 19:07
  • Good for you. You can post it as answer and accept it, or even delete your question. BTW you don't need to escape `.` inside `[...]`. – Pshemo Apr 22 '15 at 19:10
  • Thank you, I didn't realize that. Even cleaner without escaping it. – Lani1234 Apr 22 '15 at 19:12

1 Answers1

0

Thank you to @Pshemo, the solution I needed is: [^\w\*\-.]

From this article: http://www.regular-expressions.info/charclass.html Under Negated Character Classes, we see that

"Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9\r\n] matches any character that is not a digit or a line break.

It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp."

Lani1234
  • 470
  • 5
  • 12
  • 20