I need to convert unicode files to ascii. In case, a letter doesn't exist in ascii, it should be converted to it's closest ascii representation.
I'm using the Unidecode tool for it (https://pypi.python.org/pypi/Unidecode). It works fine when I use it in the Python interpreter on the CL (thus, by invoking python
and then importing the libraries and then printing the decoded word like this: print unidecode(u'äèß')
)
Unfortunately, when I try to use this tool directly on the command line (thus, by doing something like python -c "from unidecode import *; print unidecode(u'äèß')"
, it only prints gibberish (A$?A"A
to be exact, even though it should've printed (and did in the interpreter) aess
). This is annoying and I don't know how to solve that issue. I thought it might be due to encoding errors with my Terminal, not being set correctly to utf-8 or something. However, locale
in my Terminal printed me the following output:
LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL="de_DE.UTF-8"
Or, might it be due to Python that has problems with StdIn encoding on the command line? It gave me correct output in the python interpreter, but when invoking python -c
not.
Do you guys have an idea?