Removing BOM characters from AJAX-posted string

Question

My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.

The data comes from a JavaScript source, which I get from a CKEditor instance. Then I POST the variable and read it as string on my backend and the BOMS are there. For now, they are persisted as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my CKEditor.

I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?

score 5 · Accepted Answer · answered Oct 23 '12 at 09:50

5

The utf-8 BOM bytes get translated to \ufeff. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:

   var good = bad.Replace("\ufeff", "");

answered Oct 23 '12 at 09:50

Hans Passant

873,011
131
1,552
2,371

Great success! One question though, might this cause problems by removing other bytes that get translated into the same unicode character? I doubt that I'll miss any if they get removed but are there other important or worth-mentioning such characters? – Joel Peltonen Oct 23 '12 at 09:58
1

You can't see them, you can't hear them. – Hans Passant Oct 23 '12 at 10:15

score 0 · Answer 2 · answered Oct 23 '12 at 07:06

0

Try the following:

CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);

answered Oct 23 '12 at 07:06

Peter Stock

191
7

The way I tested this was to do `string s2 = s.Replace(...)` and then `Debug.WriteLine(s2);`. Then I copy-pasted the output from my output window to Notepad++ and switched to view HEX: I still see the BOM. Did I try it wrong? – Joel Peltonen Oct 23 '12 at 07:26
That's how it is working for me. Maybe you find [this](http://stackoverflow.com/questions/2502990/create-text-file-without-bom?rq=1) helpful. – Peter Stock Oct 23 '12 at 09:58

Removing BOM characters from AJAX-posted string

2 Answers2

Linked