0

I have a question about removing unwanted character, or in a better sense, keep only certain ones. I have stumbled upon something called String literal and I don't understand how it can help me with achieving my goal. I stumbled upon this somewhere before but don't understand how to use it.

The String literal "[^\p{Alpha}-']" may be used to match any character that is NOT alphabetic, a dash, or apostrophe; you may find this useful when using replaceAll()

I understand what replaceAll() does, but other things I don't understand are the little codes like [a-zA-Z] that you can use in it and where to look to find more of them. So I pretty much want to do what the quotes says, and only keep the letters and some punctuation.

DurpBurger
  • 15
  • 4
  • For that you got to learn regular expressions..try google it – Ankur Gupta Apr 04 '16 at 01:13
  • 1
    A "String literal" is anything inside quote marks. This particular string literal is used as a _regular expression_, or _regex_ for short. Google "java tutorial regex" and you can find out what all the codes mean. Or visit http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html. – ajb Apr 04 '16 at 01:14

2 Answers2

0

The process you are describing is called Regular Expressions or regex for short. It's a tool implemented in many programming languages (including Java) which allows you to handle strings with one line of code, which would otherwise be more complicated and annoying.

I suggest this link for a more in depth tutorial.

Maljam
  • 6,048
  • 3
  • 14
  • 30
0

replaceAll() uses regexes.

There's too much to explain in a single post, but I will explain a little.

Here's a regex: [^A-Za-z.?!]

  • [] signifies a character class. It will match one of the contained characters (as modified by meta-characters).
  • ^ When this is the first character in a char class, it is a meta-character meaning NOT.
  • A-Z signifies a range. Anything between those ASCII/Unicode values will be matched
  • The ., ?, ! are treated as literals (in other contexts they can become meta-characters).

So, the regex, if quoted and put in a replaceAll() will change everything that's not alphabetic, ., ?, or !.


The second parameter in replaceAll() also accepts some special regex-related characters, like $1 does not literally mean $1.

You'll need to learn about more advanced regex things (capture groups) before you use $1.

Laurel
  • 5,522
  • 11
  • 26
  • 49
  • do do I place the regex into quotation marks when I put it into replaceAll()? – DurpBurger Apr 04 '16 at 01:24
  • @DurpBurger Yes, and you'd need to escape any characters that need to be escaped in strings. It's a pain when the regex needs \ for its own escape (meaning you need to escape the escape), so I'll sometimes paste it into an online tester that auto-escapes everything. – Laurel Apr 04 '16 at 01:28