31

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.

The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.

How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.

TylerH
  • 19,065
  • 49
  • 65
  • 86
deceze
  • 471,072
  • 76
  • 664
  • 811
  • Don't blame yourself too much. If a layout breaks because of a zero-width, non-breaking space, the renderer is misunderstanding either the sero-width or the non-breaking part. – MSalters Jul 02 '09 at 15:23
  • 2
    That's debatable I suppose. The non-breaking space connected two proper whitespace characters, so it's supposed to render something I guess. And that something happened to be squished inbetween two full-width, no-margin DIVs, which is why it showed up very prominently. I rather blame Microsoft for inventing BOMs to begin with. ;-) – deceze Jul 03 '09 at 00:47
  • I'm pretty sure [Textwrangler](http://www.barebones.com/products/TextWrangler/) will do it. EDIT: [VersionTracker link](http://www.versiontracker.com/dyn/moreinfo/macosx/18529) as Bare Bones site seems to be down again. – da5id Jul 01 '09 at 07:45
  • vi or vim will show up any non-EOL characters. – Matthew Scharley Jul 01 '09 at 07:35
  • 1
    Cleaned up some simple answers and edited. Looks on-topic without an explicit tool request – Machavity May 24 '21 at 13:40

5 Answers5

38

You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.

To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.

On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:

:set nobomb

and save the file. Presto!

The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.

TylerH
  • 19,065
  • 49
  • 65
  • 86
Boldewyn
  • 75,918
  • 43
  • 139
  • 205
  • I saw that before, but it usually rendered as garbage on top of the page. Seems it's harder to find when it's in the middle of a page...? Anyway, thanks! :) – deceze Jul 01 '09 at 08:12
  • It can occur in the middle of a page, when you use PHP's include statement with a BOM-started file to include. Otherwise it should usually not appear (although it _is_ a standard Unicode character and can be used as such). – Boldewyn Jul 01 '09 at 08:25
  • If you're editing your HTML/PHP code with Altova XMLSpy then the option to turn off BOM is found at menu "Tools/Options", tabpage "Encoding". XMLSpy can preserve BOM if it finds it, or add it to a file when it doesn't exist yet. It has no option to remove BOM. – Wim ten Brink Jul 01 '09 at 08:42
  • Oh, oops. I somehow doubt that you're using XMLSpy on a Mac OS X, although it can be installed on Mac OS X by using "Parallels for Mac" virtualization. – Wim ten Brink Jul 01 '09 at 08:46
11

If you are using Textmate and the problem is in a UTF-8 file:

  1. Open the file
  2. File > Re-open with encoding > ISO-8859-1 (Latin1)
  3. You should be able to see and remove the first character in file
  4. File > Save
  5. File > Re-open with encoding > UTF8
  6. File > Save

It works for me every time.

Mirko
  • 5,019
  • 2
  • 34
  • 33
6

It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:

grep -rn $'\xFEFF' *

It will show you the line numbers and filenames containing BOM.

Vexatus
  • 840
  • 1
  • 7
  • 10
  • Since it almost certainly are the first two bytes of the file, the problem is to get it away. I'm not quite experienced with awk, but it should be a one-liner with it to remove the first two bytes of a file. – Boldewyn Jul 01 '09 at 08:27
  • Indeed, it is not hard to find duplicate questions which show you how to do exactly that. `awk 'NR==1 { sub(/^\357\273\277/, "") } 1' file >newfile` – tripleee May 24 '21 at 13:26
2

In Notepad++, there is an option to show all characters. From the top menu:

View -> Show Symbol -> Show All Characters

TylerH
  • 19,065
  • 49
  • 65
  • 86
Umair Ahmed
  • 10,264
  • 5
  • 27
  • 38
  • 1
    As stated, I'm more looking for a Mac OS X (or UNIX) tool. – deceze Jul 01 '09 at 07:34
  • yep i missed that... i think i saw some where it can be run using Crossover. not pretty solution though – Umair Ahmed Jul 01 '09 at 07:37
  • Btw: Notepad++ has an option to save Unicode files without BOM. Just in case you're gonna switch to Windows ;-) – Boldewyn Jul 01 '09 at 11:39
  • i run notepad++ on ubuntu using wine. i don't know if wine runs on OS X. notepad++ is awesome though. – Randy L Aug 13 '10 at 18:31
  • I don't think notepad plus plus will show 'no break space' and other whitespace characters, although it will show carriage returns and line feeds... I think you have to the switch encoding, which is explained in the accepted answer – ClearBlueSky85 May 04 '16 at 22:56
1

I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.

See "Comparison of hex editors" in WikiPedia.

Craig McQueen
  • 37,399
  • 27
  • 113
  • 172
  • Even among tool request question answers, this one is not particularly useful since it does not go so far as to suggest a solution or even a tool, only a category of tools. – TylerH May 24 '21 at 13:30