2

I have thousands of name in a mysql database that have the extended ASCII code in them. I want to convert them to a normal english alphabet. Here is an example :

Indāpur Jejūri convert to -> Indapur Jejuri

So how can I do it ? I know Java and Groovy, and a bunch of other scripting languages but didn't have much luck. Any suggestion ?

AlexCon
  • 1,067
  • 1
  • 11
  • 28
  • Presumably you'd get a 64K-entry translation table and translate. – Hot Licks Mar 24 '14 at 01:44
  • Python has [unidecode](https://pypi.python.org/pypi/Unidecode), which probably has some sort of Java equivalent. – Blender Mar 24 '14 at 01:44
  • PHP answer here: http://stackoverflow.com/questions/158241/php-replace-umlauts-with-closest-7-bit-ascii-equivalent-in-an-utf-8-string – Warren Dew Mar 24 '14 at 02:31

1 Answers1

2

I found the answer after going through many posts in stackoverflow : Converting Symbols, Accent Letters to English Alphabet

import java.text.Normalizer;
import java.util.regex.Pattern;

public String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}
Community
  • 1
  • 1
AlexCon
  • 1,067
  • 1
  • 11
  • 28