A byte order mark (BOM) is a Unicode character used to signal the order of bytes in a text file or stream. As the BOM is U+FEFF, it makes it clear whether the high-order bytes are first (stream starts FE.FF) or second (stream starts FF.FE).
The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Use of a BOM is optional and, if used, it should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.
For example, the use of a UTF-16 BOM (U+FEFF
) makes it clear from the first two bytes of a text whether the stream is "big endian" (BE) — like Western numbers, so the stream would start FE FF ...
— or "little endian" (LE) — like numbers in Arabic, so the stream would start FF FE ...
. If misinterpreted as ISO-8859-1, it would show up as þÿ
(BE) or ÿþ
(LE).
In UTF-8, a BOM is neither required nor recommended, but would be the three bytes 0xEF 0xBB 0xBF
. When misinterpreted as ISO-8859-1, this renders as 
. Seeing this triplet in unusual places in code output almost always indicates that a BOM is not being ignored when it should be, or was added where it was not expected.
In UTF-32, the same BOM is used as for UTF-16 but, as 32-bits are used for each character (so U+0000FEFF
), then its ASCII-8859-1 misinterpretation would contain null characters: □□þÿ
(BE) or ÿþ□□
(LE), where □
represents the ASCII NUL
character.
More information
- Byte order mark on the English Wikipedia.
- UTF-8, UTF-16 and UTF-32 byte order marks from the Unicode FAQ
- Byte Order Mark (BOM): U+FEFF (PDF) from v5 of the Unicode standard (§16.8)