1

I have a powershell script that returns some strings via Write-Output. I would like those lines to be UTF8 with no bom. I do not want a global setting, I just want this to be effective for that particular few lines I write at that time.

This other question helped me get to a point: Using PowerShell to write a file in UTF-8 without the BOM

I took inspiration from one of the answers, and wrote the following code:

$mystr = "test 1 2 3"
$mybytes = [Text.Encoding]::UTF8.GetBytes($mystr)
$OutStream = [console]::OpenStandardOutput()
$OutStream.Write($mybytes,0,$TestBytes.Length)
$OutStream.Close()

However this code ONLY writes to stdout, and if I try to redirect it, it ignores my request. In other words, putting that code in test.ps1 and running test.ps1 >out.txt still prints to the console instead of to out.txt.

Could someone recommend how I could write this code so in case a user redirects the output of my PS to a file via >, that output is UTF8 with no BOM?

Will I Am
  • 2,484
  • 3
  • 28
  • 54
  • 3
    Your question title makes no sense. UTF-8 and BOM are related to files, not the console. Your description text doesn't make it any clearer. What are you trying to achieve and how does it fail? You can't change how a user redirects his stuff. Tell the user to use `Out-File -Encoding ascii` (if you don't need any special characters). –  Feb 18 '18 at 08:48
  • I disagree it makes no sense.It is a PS quirk. In other words CMD, "echo hello world > test.txt" works as expected and writes exactly the bytes I pipe. In PS, If I do Write-Output "hello world" > test.txt, it makes assumptions about what I'm writing and inserts a BOM. I was asking if there is a way to redirect binary output (hence the byte array). I will however close because apparently that is not how PS works. – Will I Am Feb 18 '18 at 16:44
  • You literally ask to "write UTF8 with no BOM to console". The console has no concept of UTF8 or BOM. Encoding is used when writing files, not when printing text to console. –  Feb 19 '18 at 09:38
  • Most (if not all) shells allow you to preencode a string, and write raw input (e.g. binary) out, that can be redirected to a file. Powershell does not, which is fine. This question is yo-yoing from -3 to +1, and I voted to close it. If you think it's a bad question, vote to close it too. I wish I could delete it, but I don't want to delete the answers from helpful people. I don't think it's a bad question, when you try to make an equivalence to other shells you may have had experience with. Your explanation why my question makes no sense is non-obvious to those new to powershell. – Will I Am Feb 19 '18 at 20:23
  • Now I believe you don't understand what UTF-8 and BOM actually means because you're still not making sense. In PowerShell you can also construct any byte array and write that to a file (see here: https://cyber-defense.sans.org/blog/2010/02/11/powershell-byte-array-hex-convert). So you could construct a BOM and encode everything as UTF-8 yourself if you want to do that, but then you also have this mess in the console... –  Feb 20 '18 at 11:37
  • I deleted my comment because it was unnecessarily continuing this discussion. I do understand BOM and encodings. My problem was making an assumption that I could treat the stdout as a file. If you still think the question makes no sense, please vote to close it. I won't comment anymore. – Will I Am Feb 20 '18 at 16:53

2 Answers2

2

Encoding is used for saving text to a file, not for writing to the console. Your redirection operator > is the one saving the content which means it decides the encoding. Redirection in Powershell uses Unicode. If you need to use another encoding, you can't use redirection.

When you are writing to files, the redirection operators use Unicode encoding. If the file has a different encoding, the output might not be formatted correctly. To redirect content to non-Unicode files, use the Out-File cmdlet with its Encoding parameter.

Source: about_redirection

Normally you would use ex. Out-File -Path test.txt -Encoding UTF8 inside your script, but it includes BOM so I'd recommend using WriteAllLines(path,contents) which uses UTF8 without BOM as default.

[System.IO.File]::WriteAllLines("c:\test.txt", $MyOutputArray)
Steven Penny
  • 82,115
  • 47
  • 308
  • 348
Frode F.
  • 46,607
  • 8
  • 80
  • 103
  • 2
    As an aside: fortunately, in v5.1+, you now _can_ control the encoding used by `>` / `>>`, although it's not obvious: `$PSDefaultParameterValues['Out-File:Encoding']='UTF8'` - that would still include a BOM, however. See https://stackoverflow.com/a/42451413/45375 for more. – mklement0 Feb 18 '18 at 22:31
2

To add to Frode F.'s helpful answer:

  • What you were ultimately looking to achieve was to write a raw byte stream to PowerShell's success-output stream (the equivalent of stdout in traditional shells[0] ), not to the console.

    • The success output stream is what commands in PowerShell use to pass data to each other, including to output-redirection operator >, at which point the console isn't involved.

    • (Data written to the success-output stream may end up displayed in the console, namely if the stream is neither captured in a variable nor redirected elsewhere.)

  • However, it is not possible to send raw byte streams to PowerShell's success output stream; only objects (instances of .NET types) can be sent, because PowerShell is fundamentally object-oriented.

    • Even data representing a stream of bytes must be sent as a .NET object, such as a [byte[]] array.

      • However, redirecting a [byte[]] array directly to a file with >, does not write the array's raw bytes, because > creates a "Unicode" (UTF-16LE-encoded[1]) text representation of the array (as you would see if you printed the array to the console).
    • In order to encode objects as byte streams (that are often encoded text) for external sinks such as a file, you need the help of PowerShell cmdlets (e.g., Set-Content), > (the output redirection operator), or the methods of appropriate .NET types (e.g., [System.IO.File]), except in 2 special cases:

      • When piping to an external program, the encoding stored in preference variable $OutputEncoding is implicitly used.
      • When printing to the console, the encoding stored in [Console]::OutputEncoding is implicitly used; also, output from external programs is assumed to be encoded that way[2] .
    • Generally, when it comes to text output, it is simpler to use the -Encoding parameter of output cmdlets such as Set-Content to let that cmdlet perform the encoding rather than trying to obtain a byte representation in a separate first step.

      • However, a BOM-less UTF-8 encoding cannot be selected this way in Windows PowerShell (it can in PowerShell Core), so using an explicit byte representation is an option, in combination with Set-Content -Encoding Byte[3] ; e.g.:

        # Write string "hü" to a UTF-8-encoded file *without BOM*:
        [Text.Encoding]::UTF8.GetBytes('hü') | 
          Set-Content -Encoding Byte file.txt
        

[0] Writing to stdout from within PowerShell, as you attempted, bypasses PowerShell's own system of output streams and prints directly to the console. (As an aside: Console.OpenStandardOutput() is designed to bypass redirections even in the context of traditional shells.)

[1] Up to PowerShell v5.0, you couldn't change the encoding used by >; in PSv5.1 and above, you can use something like $PSDefaultParameterValues['Out-File:Encoding']='UTF8' - that would still include a BOM, however. For background, see this answer of mine.

[2] There is a noteworthy asymmetry: on sending text to external programs, $OutputEncoding defaults to ASCII (7-bit only) encoding, which means that any non-ASCII characters get transliterated to literal ? chars.; by contrast, on interpreting text from external programs, the applicable [Console]::OutputEncoding defaults to the system's active legacy OEM code page, which is an 8-bit encoding. See the list of code pages supported by Windows.

[3] Of course, passing bytes through is not really an encoding; perhaps for that reason -Encoding Byte was removed from PowerShell Core, where -AsByteStream must be used instead.

mklement0
  • 245,023
  • 45
  • 419
  • 492
  • 1
    Thank you. I think the difference between stdout (success-output) and console finally explains why I was observing the odd behaviour (to me). I have redone my modules to account for my confusion. Great answer. – Will I Am Feb 19 '18 at 20:31