25

Consider this program:

{$APPTYPE CONSOLE}

begin
  Writeln('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
end.

The output on my console which uses the Consolas font is:

????????Z??????????????????????????????????????

The Windows console is quite capable of supporting Unicode as evidenced by this program:

{$APPTYPE CONSOLE}

uses
  Winapi.Windows;

const
  Text = 'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ';

var
  NumWritten: DWORD;

begin
  WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(Text), Length(Text), NumWritten, nil);
end.

for which the output is:

АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ

Can Writeln be persuaded to respect Unicode, or is it inherently crippled?

David Heffernan
  • 572,264
  • 40
  • 974
  • 1,389
  • 1
    [`Possible duplicate`](http://stackoverflow.com/q/265018/960757) ? I think [`TOndrej's answer`](http://stackoverflow.com/a/268202/960757) covers your question. – TLama Oct 08 '14 at 10:56
  • 1
    @TLama I saw that question. I think this is different. I want to know if there is some way to make Writeln respect Unicode. Perhaps through an RTL function call the switches behaviour. – David Heffernan Oct 08 '14 at 10:59
  • Just hints: http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:3011 . Also: http://edn.embarcadero.com/article/39022 – iPath ツ Oct 08 '14 at 11:28
  • `SetConsoleOutputCP(CP_UTF8);` works, but is not the answer you are looking for? – LU RD Oct 08 '14 at 11:57
  • 1
    @LURD Is that enough? Does the RTL convert UTF-16 to UTF-8 or do you have to? In any case it feels wrong to have to frab code pages. I do wonder what Emba were thinking. – David Heffernan Oct 08 '14 at 12:01
  • If I use `SetConsoleOutputCP(CP_UTF8)`, then *most* of the string looks OK, but there are a few "box" characters near the end, so that is not perfect. I also tried `SetTextCodePage(Output, 1200)`, but that gives me lots of weird symbols, and a string like `'1200'` is displayed as `1 2 0 0`, so that can't be it either. – Rudy Velthuis Oct 08 '14 at 15:10
  • Also, for character set support outside the space covered by the default console fonts, see : http://stackoverflow.com/a/21753872/327083 – J... Apr 01 '15 at 18:05

3 Answers3

27

Just set the console output codepage through the SetConsoleOutputCP() routine with codepage cp_UTF8.

program Project1;

{$APPTYPE CONSOLE}

uses
  System.SysUtils,Windows;
Const
  Text =  'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ';
VAR
  NumWritten: DWORD;
begin
  ReadLn;  // Make sure Consolas font is selected
  try
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(Text), Length(Text), NumWritten, nil);    
    SetConsoleOutputCP(CP_UTF8);
    WriteLn;
    WriteLn('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

Outputs:

АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ
АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ

WriteLn() translates Unicode UTF16 strings to the selected output codepage (cp_UTF8) internally.


Update:

The above works in Delphi-XE2 and above. In Delphi-XE you need an explicit conversion to UTF-8 to make it work properly.

WriteLn(UTF8String('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ'));

Addendum:

If an output to the console is done in another codepage before calling SetConsoleOutputCP(cp_UTF8), the OS will not correctly output text in utf-8. This can be fixed by closing/reopening the stdout handler.

Another option is to declare a new text output handler for utf-8.

var
  toutUTF8: TextFile;
...
SetConsoleOutputCP(CP_UTF8);
AssignFile(toutUTF8,'',cp_UTF8);  // Works in XE2 and above
Rewrite(toutUTF8);
WriteLn(toutUTF8,'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
LU RD
  • 32,988
  • 5
  • 71
  • 260
  • @user246408, tested in XE5 and XE7. – LU RD Oct 08 '14 at 12:57
  • @user246408 I tested this in XE3. It's a decent workaround (+1), but I'd be concerned about changing the code page. At the very least I'd want to change it back when the process detached from the console. – David Heffernan Oct 08 '14 at 12:57
  • Does not work in Delphi XE; requires explicit cast to UTF8String. – kludg Oct 08 '14 at 13:05
  • @DavidHeffernan, I think you should be able to get the actual code page by using `GetConsoleOutputCP` before doing the change and change it back after. Check the (great) answer to [this question](http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using). – Guillem Vicens Oct 08 '14 at 13:52
  • @GuillemVicens Sure. I'd probably want to use `SetConsoleCtrlHandler` to make sure that the restoration happened even for abnormal termination. – David Heffernan Oct 08 '14 at 13:55
  • This does not look correct in my German console window, so something is wrong here. Could of course be my font, so I'll try with another. – Rudy Velthuis Oct 08 '14 at 15:12
  • 1
    Ok, changing the font from Lucida Console to Consolas made it look OK. – Rudy Velthuis Oct 08 '14 at 15:15
  • @DavidHeffernan, when using MinWidth specifier, the RTL does not take into account the code points. This will make it hard to format the output. `WriteLn('АБ':4);` gives zero padding since Length('АБ') is 4 in an utf-8 string. – LU RD Oct 08 '14 at 22:35
  • I guess the necessary information would be available to `AlternateWriteUnicodeStringProc`. – David Heffernan Oct 09 '14 at 07:07
  • Testing in XE2 I found it fails if the first `WriteLn;` is called before, not after `SetConsoleOutputCP(CP_UTF8);` What is that all about? – Hugh Jones Oct 09 '14 at 17:26
  • @HughJones, I also saw this. My theory is that once an output is done in a locale, all attempts to change the codepage is swallowed by the OS. I'm searching for documentation, but I have seen this confirmed in other forums. – LU RD Oct 09 '14 at 18:12
11

The System unit declares a variable named AlternateWriteUnicodeStringProc that allows customisation of how Writeln performs output. This program:

{$APPTYPE CONSOLE}

uses
  Winapi.Windows;

function MyAlternateWriteUnicodeStringProc(var t: TTextRec; s: UnicodeString): Pointer;
var
  NumberOfCharsWritten, NumOfBytesWritten: DWORD;
begin
  Result := @t;
  if t.Handle = GetStdHandle(STD_OUTPUT_HANDLE) then
    WriteConsole(t.Handle, Pointer(s), Length(s), NumberOfCharsWritten, nil)
  else
    WriteFile(t.Handle, Pointer(s)^, Length(s)*SizeOf(WideChar), NumOfBytesWritten, nil);
end;

var
  UserFile: Text;

begin
  AlternateWriteUnicodeStringProc := MyAlternateWriteUnicodeStringProc;
  Writeln('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
  Readln;
end.

produces this output:

АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ

I'm sceptical of how I've implemented MyAlternateWriteUnicodeStringProc and how it would interact with classic Pascal I/O. However, it appears to behave as desired for output to the console.

The documentation of AlternateWriteUnicodeStringProc currently says, wait for it, ...

Embarcadero Technologies does not currently have any additional information. Please help us document this topic by using the Discussion page!

David Heffernan
  • 572,264
  • 40
  • 974
  • 1,389
  • @user246408 Can you expand? What does not work in XE? Is `AlternateWriteUnicodeStringProc` not present in XE? – David Heffernan Oct 08 '14 at 13:03
  • @user246408 The D2010 `_WriteUString` begins `// !!! FIXME` and has no reference to `AlternateWriteUnicodeStringProc` so I guess that's what you are referring to – David Heffernan Oct 08 '14 at 13:05
  • yes no such identifier, compile time error; FIXME I believe because it does not even respect ANSI system codepage; console localization seems to be totally broken. – kludg Oct 08 '14 at 13:11
  • In XE2 codepage support was introduced to the `WriteLn(F, ...` version (where F is a `TextFile`), see also here: http://stackoverflow.com/questions/14232900/unicode-text-file-output-differs-between-xe2-and-delphi-2009 – Jens Mühlenhoff Oct 08 '14 at 13:25
  • So I guess Emba decided to introduce this `AlternateWriteUnicodeStringProc` in XE2, too. – Jens Mühlenhoff Oct 08 '14 at 13:26
  • For output redirection support you have to check the return code of WriteConsole and then call WriteFile instead. – Jens Mühlenhoff Oct 10 '16 at 14:25
  • Works perfectly for me on XE10.1 (Berlin). – Paulo França Lacerda Dec 17 '16 at 01:52
5

WriteConsoleW seems to be a quite magical function.

procedure WriteLnToConsoleUsingWriteFile(CP: Cardinal; AEncoding: TEncoding; const S: string);
var
  Buffer: TBytes;
  NumWritten: Cardinal;
begin
  Buffer := AEncoding.GetBytes(S);
  // This is a side effect and should be avoided ...
  SetConsoleOutputCP(CP);
  WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), Buffer[0], Length(Buffer), NumWritten, nil);
  WriteLn;
end;

procedure WriteLnToConsoleUsingWriteConsole(const S: string);
var
  NumWritten: Cardinal;
begin
  WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(S), Length(S), NumWritten, nil);
  WriteLn;
end;

const
  Text = 'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ';
begin
  ReadLn; // Make sure Consolas font is selected
  // Works, but changing the console CP is neccessary
  WriteLnToConsoleUsingWriteFile(CP_UTF8, TEncoding.UTF8, Text);
  // Doesn't work
  WriteLnToConsoleUsingWriteFile(1200, TEncoding.Unicode, Text);
  // This does and doesn't need the CP anymore
  WriteLnToConsoleUsingWriteConsole(Text);
  ReadLn;
end.

So in summary:

WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), ...) supports UTF-16.

WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), ...) doesn't support UTF-16.

My guess would be that in order to support different ANSI encodings the classic Pascal I/O uses the WriteFile call.

Also keep in mind that when used on a file instead of the console it has to work as well:

unicode text file output differs between XE2 and Delphi 2009?

That means that blindly using WriteConsole breaks output redirection. If you use WriteConsole you should fall back to WriteFile like this:

var
  NumWritten: Cardinal;
  Bytes: TBytes;
begin
  if not WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(S), Length(S),
    NumWritten, nil) then
  begin
    Bytes := TEncoding.UTF8.GetBytes(S);
    WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), Bytes[0], Length(Bytes),
      NumWritten, nil);
  end;
  WriteLn;
end;

Note that output redirection with any encoding works fine in cmd.exe. It just writes the output stream to the file unchanged.

PowerShell however expects either ANSI output or the correct preamble (/ BOM) has to be included at the start of the output (or the file will be malencoded!). Also PowerShell will always convert the output into UTF-16 with preamble.

MSDN recommends using GetConsoleMode to find out if the standard handle is a console handle, also the BOM is mentioned:

WriteConsole fails if it is used with a standard handle that is redirected to a file. If an application processes multilingual output that can be redirected, determine whether the output handle is a console handle (one method is to call the GetConsoleMode function and check whether it succeeds). If the handle is a console handle, call WriteConsole. If the handle is not a console handle, the output is redirected and you should call WriteFile to perform the I/O. Be sure to prefix a Unicode plain text file with a byte order mark. For more information, see Using Byte Order Marks.

Community
  • 1
  • 1
Jens Mühlenhoff
  • 13,744
  • 6
  • 47
  • 101
  • -1 That is not what `WriteConsoleW` does. The Windows console is perfectly capable of writing international characters through `WriteConsoleW`, although it is limited to UCS-2. Add a call to `Writeln(GetConsoleCP)` to the second program in my question and observe that the output is not 65001. Sorry to downvote you, but I felt compelled to do so since what you state is demonstrably wrong. – David Heffernan Oct 08 '14 at 12:34
  • When you call `WriteConsoleW`, the former applies. Try this: `SetConsoleOutputCP(1252); WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(Text), Length(Text), NumWritten, nil);` Note that the text is output correctly even though the characters are not present in the output codepage. – David Heffernan Oct 08 '14 at 12:39
  • That's true enough. `WriteConsoleW` is clearly doing significant work. – David Heffernan Oct 08 '14 at 12:57
  • I've rewritten my answer to reflect my research. – Jens Mühlenhoff Oct 08 '14 at 13:16