How do I convert Æ
and á
into a regular English char with Java ? What I have is something like this : Local TV from Paraná
. How to convert it to [Parana] ?
Asked
Active
Viewed 4,990 times
4
Frank
- 28,342
- 54
- 158
- 227
-
This question is duplicate of http://stackoverflow.com/questions/1008802/converting-symbols-accent-letters-to-english-alphabet Please refer to that question for an answer – brianpeiris Dec 26 '09 at 18:13
-
Æ corresponds to the char with int value 198. – Thorbjørn Ravn Andersen Dec 26 '09 at 21:29
2 Answers
6
Look at icu4j or the JDK 1.6 Normalizer:
public String removeAccents(String text) {
return Normalizer.normalize(text, Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
GAMA
- 5,694
- 13
- 75
- 122
bmargulies
- 91,317
- 38
- 166
- 290
-
You probably meant "Normalizer.normalize(text, Normalizer.Form.NFD)" instead of "Normalizer.decompose(text, false, 0)" – Steve Emmerson Dec 26 '09 at 18:58
-
I think I accidentally put in the old sun. class scheme instead. Thanks for catching it. – bmargulies Dec 26 '09 at 19:42
-
Normalizer.Form.NFKD may be better than Normalizer.Form.NFD for his purposes, depending on how he wants to treat ligatures. eg: NFKD will transform `"fi"` into `"fi"`. – Laurence Gonsalves Dec 26 '09 at 21:34
-
http://stackoverflow.com/a/3322174/535203 says `replaceAll("[^\\p{ASCII}]", "");` – Anthony O. Jan 08 '13 at 16:37
0
As far as I know, there's no way to do this automatically -- you'd have to substitute manually using String.replaceAll.
String str = "Paraná";
str = str.replaceAll("á", "a");
str = str.replaceAll("Æ", "a");
Kaleb Brasee
- 48,461
- 8
- 103
- 110