Classic ASP text substitution and UTF-8 encoding

Question

We have a website that uses Classic ASP.

Part of our release process substitutes values in a file and we found a bug in it where it will write the file out as UTF-8.

This then causes our application to start spitting out garbage. Apostrophes get returned as some encoded characters.

If we then go an remove the BOM that says this file is UTF-8 then the text that was previously rendered as garbage is now displayed correctly.

Is there something that IIS does differently when it encounters UTF-8 a file?

If removing the UTF-8 BOM causes the page to render correctly then the content isn't UTF-8 surely? — AnthonyWJones, Sep 22 '09 at 06:48

score 20 · Answer 1 · answered Aug 25 '11 at 18:32

20

I was searching on the same exact issue yesterday and came across:

http://blog.inspired.no/utf-8-with-asp-71/

Important part from that page, in case it goes away...

ASP CODE:

Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8"

and the following HTML META tag:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />

We were using the meta tag and asp CharSet property, yet the page still didn't render correctly. After adding the other three lines to the asp file everything just worked.

Hope this helps!

answered Aug 25 '11 at 18:32

Werewolf

472
3
11

3

You don't need both the meta tag and `Response.CharSet = "UTF-8"` as they both serve the same purpose, personally I prefer to use `Response.CharSet = "UTF-8"` rather then explicitly setting it as a meta tag in [tag:HTML]. Also `Response.AddHeader "Content-Type", "text/html;charset=UTF-8"` is just an explicit form of writing `Response.ContentType = "text/html"` and `Response.CharSet = "UTF-8"` what you are suggesting is pointless, stick to using `Response.ContentType` and `Response.CharSet`. – user692942 Feb 05 '14 at 10:51
Implicitly declaring your charSet and contentType in a meta tag meets W3C standards of acceptable practices. Regardless of how you decide to declare the headers in your asp, redundant or not, you should still include a meta tag that declares the content type and charset. If you run a page through the W3C validation checker at https://validator.w3.org/i18n-checker/ it will fail without the meta tag for type declaration. It's better, in this particular case, to have too many declarations than too few. – MistyDawn Aug 21 '19 at 16:52

score 9 · Accepted Answer · answered Sep 21 '09 at 11:30

9

UTF-8 does not use BOMs; it is an annoying misfeature in some Microsoft software that puts them there. You need to find what step of your release process is putting a UTF-8-encoded BOM in your files and fix it — you should stop that even if you are using UTF-8, which really these days is best.

But I doubt it's IIS causing the display problem. More likely the browser is guessing the charset of the final displayed page, and when it sees bytes that look like they're UTF-8 encoded it guesses the whole page is UTF-8. You should be able to stop it doing that by stating a definitive charset by using an HTTP header:

Content-Type: text/html;charset=iso-8859-1

and/or a meta element in the HTML

<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />

Now (assuming ISO-8859-1 is actually the character set your data are in) it should display OK. However if your file really does have a UTF-8-encoded BOM at the start, you'll now see that as ‘ï»¿’ in your page, which is what those bytes look like in ISO-8859-1. So you still need to get rid of that misBOM.

answered Sep 21 '09 at 11:30

bobince

498,320
101
621
807

Right this makes sense. It was actually a bug in some code that was written specifically to handle this kind of issue. Thanks. – Derek Ekins Sep 21 '09 at 12:51
1

I must admit this answer confuses me. "UTF-8 does not use BOMs" could you eloborate? In what way is this a "misfeature" ? I've never come across a problem using UTF-8 files that include this zero width space character, what problems have you encountered? – AnthonyWJones Sep 22 '09 at 06:50
Any bytes-based text tool (such as shells, config file loaders etc.) will immediately fall over when presented with “ï»¿” at the start of a file; it is the explicit aim of UTF-8 to be compatible with tools that know nothing about Unicode, but UTF-8+BOM breaks this. Even some Unicode-aware tools will trip over it because a BOM is only expected to be present and automatically removed by the Unicode decoding process for UTF-16. UTF-8+BOM breaks applications and there is no justification for using it in the Unicode specs; and there isn't even any benefit to it as UTF-8 has no byte order issues. – bobince Sep 22 '09 at 12:48
1

Also confused about "UTF-8 does not use BOMs", there is no clarification needed, it's simply a wrongly-built affirmation. – Áxel Costas Pena Oct 03 '13 at 08:21

score 3 · Answer 3 · answered Sep 14 '13 at 19:15

3

If you using access db you should write

Session.CodePage=65001
Set tabtable= Conn.Execute("SELECT * FROM  table")

answered Sep 14 '13 at 19:15

Classic ASP text substitution and UTF-8 encoding

3 Answers3

Linked