0

In java is it possible to convert non-english characters into their english characters.

For example, I want:

Zdeborová --> Zdeborova    
Krząkała  --> Krzakala   
Sr´amek   --> Sramek

so on..

When i try the methods below

        String t1 = Normalizer.normalize("Krząkała", Normalizer.Form.NFD);
        String t2 = t1.replaceAll("[^\\p{ASCII}]", "");
        String t3 = t2.replaceAll("\\p{M}", "");

OR

String t4 = org.apache.commons.lang3.StringUtils.stripAccents("Krząkała");

They all give Krz?ka?a as a result?

I can do this process in oracle sql simly saying :

select 
REGEXP_REPLACE(replace(convert(trim(upper('Krząkała')), 'us7ascii'), '_', ' '), '[^A-Z ]', '') std

from dual;

and get KRZAKALA.

I think in java it must also so simple???

mlee_jordan
  • 735
  • 2
  • 16
  • 46
  • 2
    possible duplicate of [Converting Symbols, Accent Letters to English Alphabet](http://stackoverflow.com/questions/1008802/converting-symbols-accent-letters-to-english-alphabet) – Maroun Nov 12 '14 at 14:20
  • @MarounMaroun Just be aware that the highly upvoted and accepted answer does not actually answer the original question. – jarnbjo Nov 12 '14 at 14:42
  • @user3198674 It is not clear what you're asking about. Do you want to strip diacritic marks (as suggested by your examples) or are you looking for the pronunciation of foreign words (as stated in your question). These are two rather different problems. – jarnbjo Nov 12 '14 at 14:43
  • @jarnbjo thank for noticing. i stated wrongly. Indeed what i want is to strip diacritic marks and get English characters. I edited it. – mlee_jordan Nov 12 '14 at 14:50
  • Thanks @MarounMaroun for the link. In the related link the second answer works for some words. But it does not for Krząkała for ex. I got Krz?ka?a. – mlee_jordan Nov 12 '14 at 16:01
  • 1
    @user3198674: The output with question marks indicates that you have some sort of character encoding problem, e.g. saving the source file with one character encoding and letting the compiler use a different encoding. The expected output for "Krząkała" would be "Krzakaa". The problem is that the stroke on the ł is not a diacritic mark and since the example code removes all non-ASCII characters, the ł disappears completely. – jarnbjo Nov 12 '14 at 17:36

0 Answers0