4

I am having am having or have run into a very strange thing.

I wonder if others have and why it's happening.

Having run a one line program with this line System.Console.WriteLine(System.Console.OutputEncoding.EncodingName); I see the Encoding is Western European (DOS)

Fine

Here is a list of some codepages 1200 Unicode and 65001 utf-8 and Windows-1252 Western European (Windows) and 850 Western European DOS from https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx

Say I write a C sharp program to change the encoding to utf-8

class sdf
{
  static void Main(string[] args)
{
System.Console.WriteLine(System.Console.OutputEncoding.EncodingName);
  System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(65001);
System.Console.WriteLine(System.Console.OutputEncoding.EncodingName);
}
}

It works, it prints

Western European (DOS)
Unicode (UTF-8)

Now when I run csc again, csc crashes.

enter image description here

I checked my RAM for 14 hours, 8 passes, with memtest. I ran chkdsk my hard drive, all fine. And this is definitely not those, this is a coding issue. I know that because if I open up a new cmd prompt, then run csc, it doesn't crash.

So running that c sharp program, changes the shell such that the next time just running csc crashes csc itself, in that big way.

If I compile the code below, then run it, then run csc, then run csc, or csc whatever.cs, I get csc crashing.

So close the cmd prompt, Open a new one.

This time, experiment with comment and uncommenting the second line of the program

I find that if the second line (the line that changes the codepage to 850 (DOS Western Europe), is there, then it it won't crash the next time I run csc.

Whereas if I comment out that second line, so the program exits having the codepage/encoding changed to UTF-8 then then next time csc runs, csc crashes.

// uncomment the last line, and then // this runs but makes csc crash next time.

class asdf
{
  static void Main()
  {

     System.Console.OutputEncoding = System.Text.Encoding.UTF8; //output and to utf8
     System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(850); 
  }
}

I am not the only person that has run into something like this

though no explanation was found there https://social.msdn.microsoft.com/Forums/vstudio/en-US/0e5f477e-0c32-4e88-acf7-d53d43d5b566/c-command-line-compiler-cscexe-immediately-crashes-when-run-in-code-page-65001-utf8?forum=csharpgeneral

I can deal with it by making sure the last line sets the codepage to 850. Though as i'll explain that's an inadequate solution..

Also i'd like to know if this is some problem with CSC that others have too. Or any other solutions.

added

uuu1.cs

// uuu1.cs
class asdf
{
static void Main()
{

System.Console.InputEncoding  = System.Text.Encoding.UTF8;
System.Console.OutputEncoding = System.Text.Encoding.UTF8;

// not unicode.  UTF8 means redirection will then work

System.Console.WriteLine("ჵ");

// try redirecting too..

// and try  checking for csc crash or not
//System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(850);
//System.Console.InputEncoding =System.Text.Encoding.GetEncoding(850);
//problem is that when that is commented, it breaks the redirection



}
}

Adding the line / uncomment the last lines so I have

System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(850);

would stop the crash but is an inadequate solution, because for example.. If I want to redirect the output of a program to a file, then I need UTF8 all the way from beginning to end, otherwise it doesn't work

this works with the codepage 850 line uncommented

c:\blah>uuu1>r.r<ENTER>  
c:\blah>type r.r <ENTER>  
c:\blah>ჵ  

If I uncomment the last lines, thus changing the codepage to 850 then sure csc won't crash on the next run, but the redirection doesn't work and r.r doesn't contain that character.

Added 2

Han's answer makes me notice another way of triggering this error

C:\Users\harvey\somecs3>csc<ENTER>
Microsoft (R) Visual C# Compiler version 4.0.30319.18408
for Microsoft (R) .NET Framework 4.5
Copyright (C) Microsoft Corporation. All rights reserved.

warning CS2008: No source files specified
error CS1562: Outputs without source must have the /out option specified

C:\Users\harvey\somecs3>chcp  65001<ENTER>
Active code page: 65001

C:\Users\harvey\somecs3>csc<ENTER>  <-- CRASH

C:\Users\harvey\somecs3>
barlop
  • 10,225
  • 7
  • 63
  • 94
  • How do you compile and run this? – CodeCaster Jun 20 '15 at 17:32
  • @CodeCaster put the code in a file with extension `.cs` e.g. `aaa1.cs` then run `csc aaa1.cs` – barlop Jun 20 '15 at 17:48
  • @CodeCaster development command prompt or regular command prompt have the same issue. btw I set the font too The steps for how to change the font, are described http://www.techrepublic.com/blog/windows-and-office/quick-tip-add-fonts-to-the-command-prompt/ – barlop Jun 20 '15 at 18:28
  • @CodeCaster the process to add fonts to the command prompt would be quite a long post in itself. the post would double in size so I linked to an article on doing that. And many many people that are familiar with this problem are going to have added unicode fonts to the command prompt anyway. The problem here is with csc.exe not with adding fonts to the command prompt. One could do this without adding fonts to the command prompt though, it's just not as clear. – barlop Jun 20 '15 at 18:38
  • bug report here https://connect.microsoft.com/VisualStudio/feedback/details/2632278 – barlop Apr 26 '16 at 14:36
  • Ah, that explains it! I was wondering why `csc` from .NET 3+ kept crashing. Once I saw this post, I remembered that `find` also crashes when the codepage is Unicode (`chcp 65001`), and sure enough, changing the codepage to 437 lets `csc` run without crashing. It's pretty bad that Microsoft's own programs can't handle Unicode. ¬_¬ – Synetech May 20 '21 at 15:15
  • @Synetech not only that, but the chcp command is pretty unclear and misleading, see the last paragraph in sstan's answer – barlop May 20 '21 at 23:23

2 Answers2

6

Well, you found a bug in the way the C# compiler deals with having to output text to the console when it is switched to UTF-8. It has a self-diagnostic to ensure the conversion from an UTF-16 encoded string to the console output code page worked correctly, it slams the Big Red Button when it didn't. The stack trace looks like this:

csc.exe!OnCriticalInternalError()  + 0x4 bytes  
csc.exe!ConsoleOutput::WideToConsole()  + 0xdc51 bytes  
csc.exe!ConsoleOutput::print_internal()  + 0x2c bytes   
csc.exe!ConsoleOutput::print()  + 0x80 bytes    
csc.exe!ConsoleOutput::PrintString()  + 0xb5 bytes  
csc.exe!ConsoleOutput::PrintBanner()  + 0x50 bytes  
csc.exe!_main()  + 0x2d0eb bytes    

The actual code for WideToConsole() is not available, the closest match is this version from the SSCLI20 distribution:

/*
 * Like WideCharToMultiByte, but translates to the console code page. Returns length,
 * INCLUDING null terminator.
 */
int ConsoleOutput::WideCharToConsole(LPCWSTR wideStr, LPSTR lpBuffer, int nBufferMax)
{
    if (m_fUTF8Output) {
        if (nBufferMax == 0) {
            return UTF8LengthOfUnicode(wideStr, (int)wcslen(wideStr)) + 1; // +1 for nul terminator
        }
        else {
            int cchConverted = NULL_TERMINATED_MODE;
            return UnicodeToUTF8 (wideStr, &cchConverted, lpBuffer, nBufferMax);
        }

    }
    else {
        return WideCharToMultiByte(GetConsoleOutputCP(), 0, wideStr, -1, lpBuffer, nBufferMax, 0, 0);
    }
}

/*
 * Convert Unicode string to Console ANSI string allocated with VSAlloc
 */
HRESULT ConsoleOutput::WideToConsole(LPCWSTR wideStr, CAllocBuffer &buffer)
{
    int cch = WideCharToConsole(wideStr, NULL, 0);
    buffer.AllocCount(cch);
    if (0 == WideCharToConsole(wideStr, buffer.GetData(), cch)) {
        VSFAIL("How'd the string size change?");
        // We have to NULL terminate the output because WideCharToMultiByte didn't
        buffer.SetAt(0, '\0');
        return E_FAIL;
    }
    return S_OK;
}

The crash occurs somewhere around the VSFAIL() assert, judging from the machine code. I can see the return E_FAIL statement. It was however changed from the version I posted, the if() statement was modified and it looks like VSFAIL() was replaced by RETAILVERIFY(). Something broke when they made those changes, probably in UnicodeToUTF8() which is now named UTF16ToUTF8(). Re-emphasizing, the version I posted does not in fact crash, you can see for yourself by running C:\Windows\Microsoft.NET\Framework\v2.0.50727\csc.exe. Only the v4 version of csc.exe has this bug.

The actual bug is hard to dig out from the machine code, best to let Microsoft worry about that. You can file the bug at connect.microsoft.com. I don't see a report that resembles it, fairly remarkable btw. The workaround for this bug is to use CHCP to change the codepage back.

Hans Passant
  • 873,011
  • 131
  • 1,552
  • 2,371
  • Mine is `Microsoft (R) Visual C# Compiler version 4.0.30319.18408` – barlop Jun 20 '15 at 19:10
  • The only "submit a bug" button I can see is on the microsoft.net native page https://connect.microsoft.com/VisualStudio/MSNetNative And that just goes to an error page , a bug trying to submit the bug! http://i.imgur.com/Rxy8bW9.png maybe there's a submit a bug option elsewhere that works – barlop Jun 20 '15 at 19:46
  • Looks fine when I try it. You have to be logged-in. – Hans Passant Jun 20 '15 at 20:00
  • [by contrast] If I click on powershell in the directory then there's a button to submit a bug. If I click on visual studio I get a message saying "you have been invited to join a private NDA program for Visual Studio, you can view specific content by selecting from the dropdown titled "Programs" located at the top of this page." at that point I've no idea what to click next, but clicking programs doesn't seem to help much. – barlop Jun 20 '15 at 20:06
  • you're welcome to submit it or if you see the submit bug option let me know.. BTW another funny thing is while yeah chcp 850 works.. If doing chcp just prior to doing chcp 850 it says the codepage is 850. So why chcp 850 makes a difference I don't know, if it's already on 850 according to chcp. – barlop Jun 20 '15 at 20:16
  • I have submitted a bug report now. But you actually one actually has to sign out then go to https://connect.microsoft.com/VisualStudio then click "submit a bug" then it prompts to sign in,. and it gives a feedback form to submit the bug. But being signed in when going to that URL gave an error. http://webapps.stackexchange.com/questions/79495/i-hit-a-bug-when-trying-ro-report-a-bug-regarding-visual-studio-to-connect-micro/79496#79496 – barlop Jun 22 '15 at 17:44
  • @barlop, your code only changes the output codepage, i.e. `SetConsoleOutputCP`, but running `chcp.com` checks only the input codepage, i.e. `GetConsoleCP`. Running `chcp.com 850` modifies both the input and output codepages, i.e. it calls `SetConsoleCP` and `SetConsoleOutputCP`. BTW using codepage 65001 in the console has many bugs across different versions of Windows from XP through to Windows 10 -- some in conhost.exe (maybe condrv.sys in Windows 8+) and some in the C runtime or other library. If you need Unicode in the console, you should use the [W]ide API. – Eryk Sun Jun 25 '15 at 09:35
  • @eryksun very interesting points you make and re chcp only checking input.. I can't find much on the [W]ide API. How would I use the "[W]ide API" in C sharp to e.g. set the codepage to UTF8? – barlop Jun 27 '15 at 11:51
  • @barlop, set the input and output encoding to `System.Text.Encoding.Unicode`. With this setting `Console.ReadLine` calls Win32 `ReadConsoleW` and `Console.WriteLine` calls Win32 `WriteConsoleW`. Note that non-unicode encodings call `ReadFile`, which has special handling for `Ctrl+Z` at the start of a line, but `ReadConsole` doesn't do this. – Eryk Sun Jun 27 '15 at 16:48
  • @eryksun Thanks.. And where is that documented about System.Text.Encoding.Unicode causing Console.ReadLine to call Win32 ReadConsoleW? `System.Console.InputEncoding=System.Text.Encoding.UTF8` I notice that chcp still shows 65001 for that. And still has the crashing csc bug. With `.Unicode` the chcp isn't changed and csc doesn't crash.. But then if I do `myprog>a.a` it won't redirect foreign characters unless I do UTF8 which seems to be 65001. – barlop Jun 27 '15 at 18:35
  • @barlop, when you redirect to a pipe or file it's writing UTF-16 without a BOM (i.e. `"\uFEFF"`), so some programs may not detect the text encoding. If `System.Console.IsOutputRedirected` you can write a BOM in that case or switch to UTF-8 if you prefer. – Eryk Sun Jun 27 '15 at 20:51
2

There are different articles out there that hint to the fact that the Windows Console has many Unicode-related bugs. Articles such as: https://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

Here is one workaround that works for me. Instead of:

csc aaa1.cs

Try this (which will redirect the CSC output to a file):

csc /utf8output aaa1.cs > aaa1-compilation.log

Relevant documentation: https://msdn.microsoft.com/en-us/library/d5bxd1x2.aspx

In some international configurations, compiler output cannot correctly be displayed in the console. In these configurations, use /utf8output and redirect compiler output to a file.

added by barlop

looking at chat, we have found that doing csc uuu1.cs<ENTER> uuu1<ENTER> then to prevent crashing, every csc to come has to be done with /utf8output AND (for some odd unknown reason),bizarrely, with a redirect.. so, csc /utf8output uuu1.cs >asdfsdaf

Han's workaround is better though, just run chcp 850 (or whatever codepage you use) after the uuu1<ENTER> even if chcp says it's 850, you still have to do chcp 850. Then csc will run normally.

The reason why, when having an issue, you should run chcp 850 even if chcp is showing 850, is because chcp will only show you the input encoding, though chcp 850 will change both the input encoding and the output encoding, and we want the output encoding change. So chcp could show 850 even when your output encoding is 65001, and the issue is only when the output encoding is 65001

Max
  • 4,985
  • 3
  • 37
  • 47
sstan
  • 32,273
  • 5
  • 41
  • 62
  • doesn't work http://i.imgur.com/JPa9xlD.png and even doing it exactly as you have .. The aaa1-compilation.log file has no error it just has the output of the first run of the compiler. running csc after that, whatever is after csc, whether something or nothing is after csc, it doesn't work, the crash happens – barlop Jun 20 '15 at 18:36
  • Sorry, I'm a bit confused. The image you linked shows that you ran the `csc` command with the `utf8output` parameter, but I don't see you redirecting the output to a file. Can you clarify what you tried exactly? In my tests, the only way to get CSC to work with a utf-8 console, is by doing those 2 things in combination: utf8output param + output redirection to file. – sstan Jun 20 '15 at 18:42
  • http://i.imgur.com/WgfgqlS.png file but i'll get the crash even without doing the redirection. And the crash is the second running of csc, after running the program. As shown in this screenshot. The reason I want UTF8 is because redirection works with UTF8. The problem is that csc then crashes on the second run. – barlop Jun 20 '15 at 18:45
  • The redirection (uuu1>file) mentioned at one point in my question works(always did), because it works when I have the encoding set to UTF8. But the problem is that having UTF8 leads to csc then crashing on the next run. – barlop Jun 20 '15 at 18:50
  • Ok, and I agree with you. I am simply suggesting that you run CSC with the redirection for your 2nd run as well, as a workaround. Once the console is in UTF-8 mode, I can't make it work "normally" either. I am stuck calling `csc /utf8output test.cs > test.log` all the time from that point forward to avoid the crash. Does this help you in any way? Or is your point that you want to be able to simply do `csc test.cs` for your 2nd run without any extra params or redirections? – sstan Jun 20 '15 at 18:53
  • And just to make sure I am not confusing you. I am talking about redirecting the output of the `CSC` call. Not redirecting the output of your program. – sstan Jun 20 '15 at 18:57
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/81086/discussion-between-barlop-and-sstan). – barlop Jun 20 '15 at 18:57
  • people can refer to chat for further details.. but i'll just note here. This workaround works. Though as we can see, hans's workaround and digging has the edge. So +1 and +1 for Hans, and i'll accept Hans workaround/solution as best – barlop Jun 20 '15 at 19:31