Questions tagged [byte-order-mark]

A byte order mark (BOM) is a Unicode character used to signal the order of bytes in a text file or stream. As the BOM is U+FEFF, it makes it clear whether the high-order bytes are first (stream starts FE.FF) or second (stream starts FF.FE).

The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Use of a BOM is optional and, if used, it should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.

For example, the use of a UTF-16 BOM (U+FEFF) makes it clear from the first two bytes of a text whether the stream is "big endian" (BE) — like Western numbers, so the stream would start FE FF ... — or "little endian" (LE) — like numbers in Arabic, so the stream would start FF FE .... If misinterpreted as ISO-8859-1, it would show up as þÿ (BE) or ÿþ (LE).

In UTF-8, a BOM is neither required nor recommended, but would be the three bytes 0xEF 0xBB 0xBF. When misinterpreted as ISO-8859-1, this renders as . Seeing this triplet in unusual places in code output almost always indicates that a BOM is not being ignored when it should be, or was added where it was not expected.

In UTF-32, the same BOM is used as for UTF-16 but, as 32-bits are used for each character (so U+0000FEFF), then its ASCII-8859-1 misinterpretation would contain null characters: □□þÿ (BE) or ÿþ□□ (LE), where represents the ASCII NUL character.

More information

539 questions
889
votes
21 answers

What's the difference between UTF-8 and UTF-8 without BOM?

What's different between UTF-8 and UTF-8 without a BOM? Which is better?
simple
  • 9,023
  • 3
  • 15
  • 11
285
votes
16 answers

Using PowerShell to write a file in UTF-8 without the BOM

Out-File seems to force the BOM when using UTF-8: $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "UTF8" $MyPath How can I write a file in UTF-8 with no BOM using PowerShell? Update 2021 PowerShell has changed a bit since I wrote this…
M. Dudley
  • 26,519
  • 30
  • 137
  • 228
219
votes
4 answers

Write to UTF-8 file in Python

I'm really confused with the codecs.open function. When I do: file = codecs.open("temp", "w", "utf-8") file.write(codecs.BOM_UTF8) file.close() It gives me the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal…
John Jiang
  • 9,721
  • 11
  • 46
  • 60
200
votes
30 answers

How can I output a UTF-8 CSV in PHP that Excel will read properly?

I've got this very simple thing that just outputs some stuff in CSV format, but it's got to be UTF-8. I open this file in TextEdit or TextMate or Dreamweaver and it displays UTF-8 characters properly, but if I open it in Excel it's doing this silly…
Ben Saufley
  • 2,958
  • 5
  • 25
  • 39
186
votes
11 answers

UTF-8 without BOM

I have javascript files that I need them to be saved in UTF-8 (without BOM), every time I convert them to the correct format in Notepad++, they are reverted back to UTF-8 with BOM when I open them in Visual Studio. How can I stop VS2010 from doing…
kabaros
  • 4,763
  • 2
  • 18
  • 32
155
votes
23 answers

How do I remove  from the beginning of a file?

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it:  PHP removes all whitespace, so a random  in the middle of…
Matt
  • 10,197
  • 24
  • 77
  • 109
118
votes
9 answers

Write text files without Byte Order Mark (BOM)?

I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this? I can write file with UTF8 encoding but, how to remove Byte Order Mark from it? edit1: I have tried code like this; Dim utf8…
Vijay Balkawade
  • 3,574
  • 12
  • 49
  • 87
114
votes
10 answers

Byte order mark screws up file reading in Java

I'm trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there…
Tom
  • 16,953
  • 13
  • 66
  • 76
107
votes
5 answers

Using awk to remove the Byte-order mark

How would an awk script (presumably a one-liner) for removing a BOM look like? Specification: print every line after the first (NR > 1) for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest
Boldewyn
  • 75,918
  • 43
  • 139
  • 205
98
votes
4 answers

Set Encoding of File to UTF8 With BOM in Sublime Text 3

When I open a file in Sublime Text 3, at the bottom I have an option to set the Character Encoding as shown in the screenshot. There is the option to set it to UTF-8 , which after doing some research means UTF-8 Without BOM, but I want to set it to…
J86
  • 11,751
  • 29
  • 115
  • 194
89
votes
6 answers

Convert UTF-8 with BOM to UTF-8 with no BOM in Python

Two questions here. I have a set of files which are usually UTF-8 with BOM. I'd like to convert them (ideally in place) to UTF-8 with no BOM. It seems like codecs.StreamRecoder(stream, encode, decode, Reader, Writer, errors) would handle this. But I…
timpone
  • 17,029
  • 31
  • 103
  • 200
77
votes
8 answers

How to detect the character encoding of a text file?

I try to detect which character encoding is used in my file. I try with this code to get the standard encoding public static Encoding GetFileEncoding(string srcFile) { // *** Use Default of Encoding.Default (Ansi CodePage) Encoding…
Cédric Boivin
  • 9,860
  • 11
  • 50
  • 91
72
votes
4 answers

Adding UTF-8 BOM to string/Blob

I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that? Using new Blob(['\xEF\xBB\xBF' + content]) yields '"my data"', of course. Neither did '\uBBEF\x22BF' work (with '\x22' == '"' being the next character…
kay
  • 23,543
  • 10
  • 89
  • 128
67
votes
6 answers

How do I remove the BOM character from my xml file

I am using xsl to control the output of my xml file, but the BOM character is being added.
raluxgaza
64
votes
11 answers

How to remove multiple UTF-8 BOM sequences

Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML. private function fetch($name) { $path = $this->j->config['template_path'] . $name . '.html'; if (!file_exists($path)) { …
sheppardzw
  • 824
  • 1
  • 7
  • 14
1
2 3
35 36