3

Example:

Open "C:\...\someFile.txt" For Output As #1
Print #1, someString
Close #1

If someString contains non-ASCII characters, how are they encoded? (UTF-8, Latin-1, some codepage depending on the Windows locale, ...)

On my system, the code above seems to use Windows-1252, but since neither the documentation of the Open statement nor the documentation of the Print # statement mention string encodings, I cannot be sure whether this is some built-in default or some system setting, and I'm looking for an authorative answer.


Note: Thanks to everyone suggesting alternatives for how to create files with specific encodings (ADODB.Stream, Scripting.FileSystemObject, etc.) - they are appreciated. This question, however, is about understanding the exact behavior of legacy code, so I am only interested in the behavior of the code quoted above.

Heinzi
  • 151,145
  • 51
  • 326
  • 481
  • Could [this answer](http://stackoverflow.com/questions/7269399/declaring-a-unicode-string-in-vba-in-excel) help you? – Daniel Dušek Nov 02 '16 at 10:11
  • The default encoding is blackbox to me. You should use an `ADODB.stream` so you can choose the `charset`. See [this](http://stackoverflow.com/questions/15906280/need-to-convert-text-files-to-unicode-from-utf8-in-vbscript) and [that](http://stackoverflow.com/questions/2524703/save-text-file-utf-8-encoded-with-vba) – Thomas G Nov 02 '16 at 10:42
  • Or to create a unicode file, use the `Scripting.FileSystemObject` methods: https://msdn.microsoft.com/en-us/library/5t9b5c0c%28v=vs.84%29.aspx – Andre Nov 02 '16 at 10:47
  • VBA uses ANSI. So whenever doing any interaction with the OS VBA Unicode strings are converted to ANSI. –  Nov 02 '16 at 12:20
  • @Noodles: That's also what I suspected. If you have any authorative source for this, this would make a great answer. – Heinzi Nov 02 '16 at 13:08
  • 1
    I don't. But VBA can only call the `A` functions. So Windows functions have two versions `A` and `W` (eg `GetWindowsTextA` and `GetWindowsTextW`). VBA always converts internal Unicode strings to ANSI strings when calling API calls. All forms are done using ANSI. You can open DLLs with notepad to see. –  Nov 02 '16 at 13:18

1 Answers1

5

Testing indicates that the VBA Print command converts Unicode strings to the single-byte character set of the code page for the current Windows "Language for non-Unicode programs" system locale. This can be illustrated with the following code, which attempts to write the Greek word Ώπα:

Option Compare Database
Option Explicit

Sub GreekTest()
    Dim someString As String
    someString = ChrW(&H38F) & ChrW(&H3C0) & ChrW(&H3B1)
    Open "C:\Users\Gord\Desktop\someFile.txt" For Output As #1
    Print #1, someString
    Close #1
End Sub

When run with Windows set to the default locale for US English, the resulting file contains the bytes

3F 70 61

which correspond to the Windows-1252 characters ?pa. Windows-1252 is the character set most commonly (but incorrectly) referred to as "ANSI".

However, after changing the Windows "non-Unicode" locale setting to Greek (Greece)

Greek.png

the same VBA code writes a file containing the bytes

BF F0 E1

which correspond to the Windows-1253 (Greek) characters Ώπα.

Gord Thompson
  • 98,607
  • 26
  • 164
  • 342
  • 1
    "Windows-1252 is … as ANSI"; reads like it is the only character encoding referred to as "ANSI". Actually, any character encoding that can be set as the "Language for non-Unicode programs" is referred as "an" ANSI encoding and the current one for the thread as "the" ANSI encoding. – Tom Blodget Nov 02 '16 at 16:08
  • 1
    @TomBlodget - Technically correct, but unfortunately lots of English-speaking Windows users equate "ANSI" with "Windows-1252". Given that "ANSI" is a vague and erroneous name it's probably best to avoid it altogether, rather than try to find a less incorrect way of using it. (There is a good discussion [here](http://stackoverflow.com/q/701882/2144390).) – Gord Thompson Nov 02 '16 at 16:29
  • It seems pointless to test something well known. VB6 was designed for the ANSI Windows 9.x OSs. Therefore it only does ANSI when interacting with the OS. COM is unicode so it does Unicode when doing COM. It does unicoode internally. BUT ALL API CALLS ARE ANSI. This is well known. –  Nov 03 '16 at 09:02