1

How to replace accented characters with plain alphabet characters?

Before you mark this question as duplicate:
I tried various solutions but none worked for me.

See the following code:

import org.apache.commons.lang3.StringUtils;

import java.text.Normalizer;
import java.util.regex.Pattern;

public class AccentsTest
{
    public static void main(String[] arguments)
    {
        String textWithAccents = "Et ça sera sa moitié.";

        System.out.println(textWithAccents);
        System.out.println(stripAccents(textWithAccents));
        System.out.println(deAccent(textWithAccents));
        System.out.println(normalize(textWithAccents));
        System.out.println(stripAccents2(textWithAccents));
    }

    // http://stackoverflow.com/a/15191069/3764804
    public static String stripAccents(String s)
    {
        return StringUtils.stripAccents(s);
    }

    // http://stackoverflow.com/a/1215117/3764804
    public static String deAccent(String str)
    {
        String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
        Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
        return pattern.matcher(nfdNormalizedString).replaceAll("");
    }

    // http://stackoverflow.com/a/8523728/3764804
    public static String normalize(String string)
    {
        string = Normalizer.normalize(string, Normalizer.Form.NFD);
        string = string.replaceAll("[^\\p{ASCII}]", "");

        return string;
    }

    // http://stackoverflow.com/a/15190787/3764804
    public static String stripAccents2(String s)
    {
        s = Normalizer.normalize(s, Normalizer.Form.NFD);
        s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
        return s;
    }
}

It outputs:

Et ?a sera sa moiti?.
Et ?a sera sa moiti?.
Et ?a sera sa moiti?.
Et a sera sa moiti.
Et ?a sera sa moiti?.

However, I want it to output the text in plain alphabet characters which would be the following:

Et ca sera sa moitie.

How can it be done? Is something wrong with my IDE? I'm using IntelliJ.

BullyWiiPlaza
  • 12,477
  • 7
  • 82
  • 129

1 Answers1

1

It was an encoding issue. If I change the .java source file's encoding to UTF-8 instead of windows-1252 the code examples all work properly by outputting the expected text.

BullyWiiPlaza
  • 12,477
  • 7
  • 82
  • 129