Questions tagged [byte-order-mark]

A byte order mark (BOM) is a Unicode character used to signal the order of bytes in a text file or stream. As the BOM is U+FEFF, it makes it clear whether the high-order bytes are first (stream starts FE.FF) or second (stream starts FF.FE).

The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Use of a BOM is optional and, if used, it should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.

For example, the use of a UTF-16 BOM (U+FEFF) makes it clear from the first two bytes of a text whether the stream is "big endian" (BE) — like Western numbers, so the stream would start FE FF ... — or "little endian" (LE) — like numbers in Arabic, so the stream would start FF FE .... If misinterpreted as ISO-8859-1, it would show up as þÿ (BE) or ÿþ (LE).

In UTF-8, a BOM is neither required nor recommended, but would be the three bytes 0xEF 0xBB 0xBF. When misinterpreted as ISO-8859-1, this renders as ï»¿. Seeing this triplet in unusual places in code output almost always indicates that a BOM is not being ignored when it should be, or was added where it was not expected.

In UTF-32, the same BOM is used as for UTF-16 but, as 32-bits are used for each character (so U+0000FEFF), then its ASCII-8859-1 misinterpretation would contain null characters: □□þÿ (BE) or ÿþ□□ (LE), where □ represents the ASCII NUL character.

More information

Byte order mark on the English Wikipedia.
UTF-8, UTF-16 and UTF-32 byte order marks from the Unicode FAQ
Byte Order Mark (BOM): U+FEFF (PDF) from v5 of the Unicode standard (§16.8)

539 questions

889

votes

21 answers

What's the difference between UTF-8 and UTF-8 without BOM?

What's different between UTF-8 and UTF-8 without a BOM? Which is better?

asked Feb 08 '10 at 18:26

simple

9,023
3
15
11

285

votes

16 answers

Using PowerShell to write a file in UTF-8 without the BOM

Out-File seems to force the BOM when using UTF-8: $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "UTF8" $MyPath How can I write a file in UTF-8 with no BOM using PowerShell? Update 2021 PowerShell has changed a bit since I wrote this…

encoding powershell utf-8 byte-order-mark

asked Apr 08 '11 at 15:02

M. Dudley

26,519
30
137
228

219

votes

4 answers

Write to UTF-8 file in Python

I'm really confused with the codecs.open function. When I do: file = codecs.open("temp", "w", "utf-8") file.write(codecs.BOM_UTF8) file.close() It gives me the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal…

python utf-8 character-encoding byte-order-mark

asked Jun 01 '09 at 09:42

John Jiang

9,721
11
46
60

200

votes

30 answers

How can I output a UTF-8 CSV in PHP that Excel will read properly?

I've got this very simple thing that just outputs some stuff in CSV format, but it's got to be UTF-8. I open this file in TextEdit or TextMate or Dreamweaver and it displays UTF-8 characters properly, but if I open it in Excel it's doing this silly…

php csv utf-8 byte-order-mark

asked Dec 03 '10 at 18:49

Ben Saufley

2,958
5
25
39

186

votes

11 answers

UTF-8 without BOM

I have javascript files that I need them to be saved in UTF-8 (without BOM), every time I convert them to the correct format in Notepad++, they are reverted back to UTF-8 with BOM when I open them in Visual Studio. How can I stop VS2010 from doing…

visual-studio-2010 visual-studio byte-order-mark

asked Mar 23 '11 at 13:45

kabaros

4,763
2
18
32

155

votes

23 answers

How do I remove ï»¿ from the beginning of a file?

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: ï»¿ PHP removes all whitespace, so a random ï»¿ in the middle of…

php utf-8 character-encoding byte-order-mark mojibake

asked Jul 15 '10 at 13:35

Matt

10,197
24
77
109

118

votes

9 answers

Write text files without Byte Order Mark (BOM)?

I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this? I can write file with UTF8 encoding but, how to remove Byte Order Mark from it? edit1: I have tried code like this; Dim utf8…

vb.net encoding file-handling byte-order-mark

asked Mar 13 '10 at 07:43

Vijay Balkawade

3,574
12
49
87

114

votes

10 answers

Byte order mark screws up file reading in Java

I'm trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there…

java utf-8 byte-order-mark

asked Dec 02 '09 at 20:04

Tom

16,953
13
66
76

107

votes

5 answers

Using awk to remove the Byte-order mark

How would an awk script (presumably a one-liner) for removing a BOM look like? Specification: print every line after the first (NR > 1) for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest

unicode awk byte-order-mark

asked Jul 01 '09 at 11:37

Boldewyn

75,918
43
139
205

votes

4 answers

Set Encoding of File to UTF8 With BOM in Sublime Text 3

When I open a file in Sublime Text 3, at the bottom I have an option to set the Character Encoding as shown in the screenshot. There is the option to set it to UTF-8 , which after doing some research means UTF-8 Without BOM, but I want to set it to…

encoding utf-8 sublimetext3 sublimetext2 byte-order-mark

asked Jan 22 '14 at 16:57

J86

11,751
29
115
194

votes

6 answers

Convert UTF-8 with BOM to UTF-8 with no BOM in Python

Two questions here. I have a set of files which are usually UTF-8 with BOM. I'd like to convert them (ideally in place) to UTF-8 with no BOM. It seems like codecs.StreamRecoder(stream, encode, decode, Reader, Writer, errors) would handle this. But I…

python utf-8 utf-16 byte-order-mark

asked Jan 17 '12 at 16:37

timpone

17,029
31
103
200

votes

8 answers

How to detect the character encoding of a text file?

I try to detect which character encoding is used in my file. I try with this code to get the standard encoding public static Encoding GetFileEncoding(string srcFile) { // *** Use Default of Encoding.Default (Ansi CodePage) Encoding…

c# encoding character-encoding byte-order-mark

asked Dec 23 '10 at 15:40

Cédric Boivin

9,860
11
50
91

votes

4 answers

Adding UTF-8 BOM to string/Blob

I need to add a UTF-8 byte-order-mark to generated text data on client side. How do I do that? Using new Blob(['\xEF\xBB\xBF' + content]) yields 'ï»¿"my data"', of course. Neither did '\uBBEF\x22BF' work (with '\x22' == '"' being the next character…

javascript utf-8 blob fileapi byte-order-mark

asked Jul 26 '13 at 10:37

kay

23,543
10
89
128

votes

6 answers

How do I remove the BOM character from my xml file

I am using xsl to control the output of my xml file, but the BOM character is being added.

xml xslt unicode byte-order-mark

asked Nov 17 '08 at 12:27

raluxgaza

votes

11 answers

How to remove multiple UTF-8 BOM sequences

Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML. private function fetch($name) { $path = $this->j->config['template_path'] . $name . '.html'; if (!file_exists($path)) { …

php utf-8 byte-order-mark

asked Apr 24 '12 at 02:04

sheppardzw

2 3

…

35 36 Next