How do I discover the encoding of a JSON message?

Question

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and...

So, essentially the JSON message can come in any of those three encodings. But... how do I guess which one is it when I receive it?

The message can come from multiple sources, such as a queue, from the browser, from the database, the file system, etc.

It also says to ignore Byte Order Masks (BOM):

...implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

I remember XML docs had a "prolog" that specified the encoding, but I can't find anything similar for JSON messages.

Any ideas?

score 1 · Answer 1 · answered Apr 24 '19 at 21:43

As per my understanding, whoever is the producer/sender of this JSON data must specify the type of encoding used instead of the receiver trying to guess it. Usually this information is a part of API documentation that the producer/sender provides to the receiver.

score 1 · Accepted Answer · answered Apr 24 '19 at 22:04

1

rsp and CouchDeveloper have covered this pretty well with their answers (I can't take credit for those).

Both answers look at the byte patterns to determine what encoding has been used. Apologies this doesn't directly answer your question, but it may help you to write an implementation of your own.

answered Apr 24 '19 at 22:04

steadweb

10,522
2
21
34

Thanks, this really helps my understanding. The RFC is a little bit obscure in this regard, and I was thinking that JSON was a text format (since the RFC always refers to JSON as "text documents"), rather than a binary format (as XML is). But it turns out JSON is a binary format as well, only a little bit more convoluted to discover. – The Impaler Apr 25 '19 at 01:04

How do I discover the encoding of a JSON message?

2 Answers2