0

I have a txt file whose encoding is UTF8 and the content is a mixed English and Chinese character plus a simplified Chinese character string I want to convert the encoding to ANSI format for use by other programs. But I use StreamReader and StreamWriter for encoding conversion Simplified Chinese part of the new txt file turned out is garbled

Strangely, when I use the built-in notepad of windows to save a new file and change the UTF8 encoding to ANSI encoding, the file content is displayed as a mixture of simplified Chinese and English.

How can I modify it?

The code used is as follows:

try
{

    System.IO.StreamReader streamReader = new StreamReader(tPath, Encoding.GetEncoding("utf-8"));

    string str = "";
    str = streamReader.ReadToEnd();
    FileStream fs = new FileStream(tPath2, FileMode.Create);
    using (StreamWriter sw = new StreamWriter(fs, Encoding.Default))
    {
        sw.Write(str);
        sw.Flush();
        sw.Close();
        sw.Dispose();
    }
}
catch (IOException ex)
{
    string msg = ex.Message.ToString();
    MessageBox.Show(msg);
}
Johnathan Barclay
  • 14,082
  • 11
  • 26
  • 1
    What *exactly* do you mean by "ANSI"? There are several encodings that are often referred to as "ANSI" - but I don't think many of them include Chinese characters. – Jon Skeet Dec 16 '19 at 09:14
  • 1
    [ANSI](https://stackoverflow.com/questions/701882/what-is-ansi-format) is very ambiguous, but is commonly used to refer to code pages that can represent a maximum of 256 different characters. Not nearly enough for any variant of Chinese. – Robby Cornelissen Dec 16 '19 at 09:16
  • Welcome to Stackoverflow :) BTW, by using `using` statement, you don't need to call `Close()` and `Dispose()` methods manually... At the end of the `using` block the `Close()` AND `Dispose()` methods are automatically called which will take care to free unmanaged resources. – Yousha Aleayoub Dec 16 '19 at 09:33
  • @JonSkeet well This is a good question. When I use notepad ++ to open the file converted with windows notepad, it only displays ANSI. orz... So i guess that ANSI is GB2312 or big5 ? i'm not sure – FengyiLin Dec 16 '19 at 09:33
  • @Yousha Aleayoub thanks your remind bro.I actually type those because of habits. – FengyiLin Dec 16 '19 at 09:38
  • I suggest the first thing you need to work out is which exact encoding you really want. (Are you sure you can't just stick to UTF-8? That's generally the best bet, to be honest.) – Jon Skeet Dec 16 '19 at 09:42
  • There are three types of characters 1) 0x00 to 0x7F (same for all encoding) 2) 0x80 to 0xFF (mapped to different characters depending on encoding) 3) 0x100 and up (two byte characters). Type 2 one byte characters usually are mapped to two byte unicode characters to save memory. Once character are incorrectly mapped to the wrong encoding, you have to convert back to a byte array using GetBytes using original encoding type and then mapped back to correct encoding. – jdweng Dec 16 '19 at 09:58
  • @jdweng: That's a *very* confusing comment - talking about "one byte" characters being mapped to two bytes is particularly confusing, as if you're talking about bytes, that's already in an encoded form. If you just mean "characters between U+0080 and U+00FF inclusive" then it would be clearer to say so. It's also not clear to me what you mean by "mapped to two byte unicode characters". Finally, note that even characters within ASCII aren't always mapped the same, e.g. due to EBCDIC. – Jon Skeet Dec 16 '19 at 11:10
  • EBCDIC (IBM characters set from the 1960's and 1970's). are not characters in Net Library and usually just 7 bits not 8. EBICDIC was really meant for punch cards that became obsolete 40 years ago. – jdweng Dec 16 '19 at 11:17

0 Answers0