2

I am not a powershell guy please excuse if my question is confusing.

We are creating a JSON file using ConverTo-JSON and it successfully creates the JSON file. However when I cat the contents of JSON it has '??' at the beginning of the json file but the same is not seen when I download the file/ view the file in file system.

Below is the powershell code which is used to create the JSON File:

$packageJson = @{
    packageName = "ABC.DEF.GHI"
    version = "1.1.1"
    branchName = "somebranch"
    oneOps = @{
        platform = "XYZ"
        component = "JNL"
    }
}

$packageJson | ConvertTo-Json -depth 100 | Out-File "$packageName.json"

Above set of code creates the files successfully and when I view the file everything looks fine but when I cat the file it has leading '??' as shown below:

??{

    "packageName":  "ABC.DEF.GHI",

    "version":  "0.1.0-looper-poc0529",

    "oneOps":  {

                  "platform":  "XYZ",

                  "component":  "JNL"

               },

   "branchName":  "somebranch"

}

Due to this I am unable to parse JSON file and it gives out following error:

com.jayway.jsonpath.InvalidJsonException: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('?' (code 65533 / 0xfffd)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
  • Your encoding is wrong. `Out-File` default encoding is UTF8 with BOM. Change to `-Encoding ascii` – Maximilian Burszley Oct 15 '18 at 17:23
  • 2
    @TheIncorrigible1: Good tip re encoding, but note that `Out-File`'s default encoding is UTF-16LE ("Unicode", always with a BOM) in _Windows PowerShell_, and UTF-8 _without BOM_ in PowerShell _Core_. – mklement0 Oct 15 '18 at 17:35
  • 1
    @TheIncorrigible1 encoding by `Out-File` as ASCII is lossy. Besides, JSON [exchanged between systems](https://tools.ietf.org/html/rfc8259) is required to be encoded with UTF-8, although without a BOM. – Tom Blodget Oct 15 '18 at 23:18

3 Answers3

4

Those aren't ? characters. Those are two different unprintable characters that make up a Unicode byte order mark. You see ? because that's how the debugger, text editor, OS, or font in question renders unprintable characters.

To fix this, either change the output encoding, or use a character set on the other end that understands UTF-8. The former is a simpler fix, but the latter is probably better in the long run. Eventually you'll end up with data that needs an extended character.

Joel Coehoorn
  • 362,140
  • 107
  • 528
  • 764
  • Good advice in general, but note that `Out-File`'s default encoding in Windows PowerShell is UTF-16LE, so that's what the other end would have to support, which is not common. – mklement0 Oct 15 '18 at 20:34
2

tl;dr

It sounds like your Java code expects a UTF-8-encoded file without BOM, so direct use of the .NET Framework is needed:

[IO.File]::WriteAllText("$PWD/$packageName.json", ($packageJson | ConvertTo-Json))

As Tom Blodget points out, BOM-less UTF-8 is mandated by the IETF's JSON standard, RFC 8259.


Unfortunately, Windows PowerShell's default output encoding for Out-File and also redirection operator > is UTF-16LE ("Unicode"), in which:

  • (most) characters are represented as 2-byte units.
  • the file starts with a special 2-byte unit (0xff 0xfe, the UTF-16LE encoding of Unicode character U+FEFF the ), the so-called (BOM byte-order mark) or Unicode signature, which serves to identify the encoding.

If target programs do not understand this encoding, they treat the BOM as data (and would subsequently misinterpret the actual data), which causes the problem you saw.

The specific symptom you saw - a complaint about character U+FFFD, which is used as the generic stand-in for an invalid character in the input - suggests that your Java code likely expects UTF-8 encoding.

Unfortunately, using Out-File -Encoding utf8 is not a solution, because PowerShell invariably writes a BOM for UTF-8 as well, which Java doesn't expect.

Workarounds:

  • If you can be sure that the JSON string contains only characters in the 7-bit ASCII range (no accented characters), you can get away with Out-File -Encoding Ascii, as TheIncorrigible1 suggests.

  • Otherwise, use the .NET framework directly for creating your output file with BOM-less UTF-8 encoding.

    • The answers to this question demonstrate solutions, one of which is shown in the "tl;dr" section at the top.
  • If it's an option, use the cross-platform PowerShell Core edition instead, whose default encoding is sensibly BOM-less UTF-8, for compatibility with the rest of the world.

    • Note that not all Windows PowerShell functionality is available in PowerShell Core, however, and vice versa, but future development efforts will focus on PowerShell Core.
mklement0
  • 245,023
  • 45
  • 419
  • 492
0

A more general solution that's not specific to Out-File is to set these before you call ConvertTo-Json:

$OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8;
Richard Dunn
  • 3,925
  • 1
  • 19
  • 31