2

Is there any program to change file encoding to UTF-8 programmatically. I have like 1000 files and I want to save them in UTF-8 format in linux.

Thanks.

beerLantern
  • 418
  • 5
  • 23

2 Answers2

5

iconv will take care of that, use it like this:

iconv -f ISO88591 -t UTF8 in.txt out.txt

where 88591 is the encoding for latin1, one of the most common 8-bit encodings, which might (or not) be your input encoding.

If you don't know the input charset, you can detect it with the standard file command or the python based chardet. For instance:

iconv -f $(file -bi myfile.txt | sed -e 's/.*[ ]charset=//') -t UTF8 in.txt out.txt

You may want to do something more robust than this one liner, like don't process files when encoding is unknown.

From here, to iterate over multiple files, you can do something like

find . -iname *.txt -exec iconv -f ISO88591 -t UTF8 {} {} \;

I didn't check this, so you might want to google iconv and find, read about them here on SO, or simply read their man pages.

Community
  • 1
  • 1
Antoine
  • 11,369
  • 6
  • 33
  • 47
  • I wanted it working with unknown charset, I found this: http://stackoverflow.com/questions/9824902/iconv-any-encoding-to-utf-8 – beerLantern Aug 01 '14 at 10:15
  • @beerLantern: Edited my answer to cover detection. Be sure to check on your files though, charset detection can be tricky for small files and/or less common charsets. – Antoine Aug 01 '14 at 10:24
2

iconv is the tool for the job.

iconv -f original_charset -t utf-8 originalfile > newfile 
mti2935
  • 9,797
  • 3
  • 23
  • 28