1

I used the answer to this question: Using PowerShell to write a file in UTF-8 without the BOM

to encode a file(UCS-2) to UTF-8. The problem is that if I run the encoding twice(or more times) the Cyrillic text is broked. How to stop the encode if the file is already in UTF-8?

The code is:

$MyFile = Get-Content $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)

1 Answers1

1

Use:

$MyFile = Get-Content -Encoding UTF8 $MyPath
  • Initially, when $MyPath is UTF-16LE-encoded ("Unicode" encoding, which I assume is what you meant), PowerShell will ignore the -Encoding parameter due to the presence of a BOM in the file, which unambiguously identifies the encoding.

    • If your original file does not have a BOM, more work is needed.
  • Once you've saved $MyPath as UTF-8 without BOM, you must tell Windows PowerShell[1] that you expect UTF-8 encoding with -Encoding UTF8, as it interprets files as "ANSI"-encoded by default (encoded according to the typically single-byte code page associated with the legacy system locale).


[1] Note that the cross-platform PowerShell Core edition defaults to BOM-less UTF-8.

mklement0
  • 245,023
  • 45
  • 419
  • 492