0

I am trying to export a file to an old programm that doesn't recognise Unicode(all my database uses utf8_unicode_ci encoding).

When I export the file, I then use Encoding.Default.

using (StreamWriter sw = new StreamWriter(parcours + "2", false, Encoding.Default))
   {
      foreach (string st in output)
      {
         sw.WriteLine("{0}", st);
      }
   }

But what is strange, is in some cases the file is correctly read, and in other cases not, but I use exactly the same function.

When I open with Notepad++, I can see that the file working is in ANSI, and the one not working is in Macintosh.

How can I always export ANSI? I guess using a Default value makes it to change encoding by itself?

nota : Here It is said that "ANSI" in notepad, just means it is not unicode, so I don't know if I can trust notepad's information?

Edit : As suggested by CodeCaster I used Windows-1251 Encoding, and I am back to the initial point, but at least I know that Encoding is where the error is?

Honestly I don't understand, in debug mode all the text is correct in my List. But in some cases the code is correctly encoded, in some cases not. Concretely here is what I mean by "works" :

ДВУТАВР20К2 is written ДВУТАВР20К2 in file (it works).

Двутавр12б1 is written ƒ¬”“ј¬–12Ѕ1 in file (doesn't work).

in string, there is no encoding as much as I know, so how could I explain that?

Siegfried.V
  • 886
  • 1
  • 9
  • 23
  • Have you tried reading the [docs](https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?view=netframework-4.7.2), in particular the big red warning block, and specifying an explicit encoding? And which encoding and code page _does_ the program you're writing files for recognize? Which kind of characters do you want to write, I guess Cyrillic given your location? Then choose the appropriate code page. – CodeCaster Sep 26 '18 at 09:19
  • 1
    I'd strongly advise using `Encoding.UTF8` everywhere if you *possibly* can. Even "ANSI" is ambiguous: https://stackoverflow.com/questions/701882/what-is-ansi-format – Jon Skeet Sep 26 '18 at 09:21
  • in fact I tried Encoding.ASCII, Encoding.Unicode, Encoding.BigEndianUnicode, Encoding.GetEncoding(1252), and the default was the only one to give me some results. I didn't try UTF8 because I read on stack that Unicode and UTF was the same thing. The problem is I don't know which encoding that old machine could read, and the "default" was the only one that made the job. (will then try UTF8 as advised by Jon Skeet) – Siegfried.V Sep 26 '18 at 09:24
  • @JonSkeet yes also read about ANSI, but for now this is the only one that made the job, now will try UTF7,8, and 32 just in case) – Siegfried.V Sep 26 '18 at 09:26
  • @JonSkeet I confirm that also UTF7,8,32 are neither working. About ANSI I read it already, that's why I tried Encoding.GetEncoding(1252) (and it didn't work). Is there a way to know exactly which format is used in a file, other through notepad++? – Siegfried.V Sep 26 '18 at 09:36
  • Please could you clarify what you mean by "working" and "made the job"? It's very unclear what you're trying to do with the file you've created, and in what way it "doesn't work". – Jon Skeet Sep 26 '18 at 09:36
  • @JonSkeet what I mean by working, means that the software could read my file. I think I will contact their developers, if they could at least tell me which format I can send, will be very helpful – Siegfried.V Sep 26 '18 at 09:42
  • 1
    @Siegfried.V: Yes, that's absolutely what you need to know - not based on experimentation, but based on their actual code. It's entirely possible that *they're* using the equivalent of `Encoding.Default`, which would mean that you'd need to know which machine you'd *run* their code on in order to produce a file for that machine :( – Jon Skeet Sep 26 '18 at 09:47

2 Answers2

2

When I open with Notepad++, I can see that the file working is in ANSI, and the one not working is in Macintosh.

If you Google that, you'll find that Notepad++'s encoding/code page auto detection isn't flawless.

If you want to write Cyrillic characters (which I assume you want, given the location in your profile) using an ANSI code page (which you want because the program you're writing the file for doesn't understand Unicode), the code page you want is Code Page 1251 Windows Cyrillic (Slavic). To get an encoding that writes characters in code points from that code page, use Encoding.GetEncoding():

using (StreamWriter sw = new StreamWriter(..., Encoding.GetEncoding("windows-1251")))
{
}

This is, assuming that the program that reads the files also uses that code page. That's the problem with non-Unicode text files, the writer and reader of the file have to agree on the encoding. So ultimately, you should find out which specific encoding the consuming application expects. I just assumed here that it's in fact Windows-1251.

CodeCaster
  • 131,656
  • 19
  • 190
  • 236
  • In fact this is the nearest answer, I also found that on stack (but forgot the link), tried it but didn't work neither, will continue on that way I think – Siegfried.V Sep 26 '18 at 09:40
  • Ok so I am back to the initial point, but at least I understood something : Windows-1251 is the good encoding. But in some cases it works, in some cases not ( I will edit question as it is long for comment). – Siegfried.V Sep 26 '18 at 09:46
  • 1
    I found where was the issue : It was not due to Encoding, but to Notepad++. I don't know why but notepad++ is not always able to read the file. When I send the file to external software it works correctly even if Notepad couldn't read it. (Before it didn't work for another reason, that software accepts only uppercase). Thanks for your help – Siegfried.V Sep 28 '18 at 04:57
0

From looking at .NET Encoding code

Calling Encoding.Default asks the OS for its windows embedded encoding, most likely UTF-8. The page suggests that you use UTF-8 or UTF-16 when possible (most likely the first one). Try this post if you want to read more.

Ohad Bitton
  • 178
  • 9