7

Possible Duplicate:
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars

How to remove diacritics from strings?

For example transform all á->a, č->c, etc. that would work for all languages.

I'm doing full-text search, and would need to ignore any diacritics on searched text.

Thanks

Community
  • 1
  • 1
Pointer Null
  • 36,993
  • 13
  • 79
  • 106

1 Answers1

19

Using API level 9+ you can use the Normalizer class, e.g.

String normalized = Normalizer.normalize("âbĉdêéè", Form.NFD)
    .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

(Keysers linked answer looks better, it cleans more crap)

This would return "abcdeee".

Community
  • 1
  • 1
Jens
  • 16,241
  • 4
  • 50
  • 50