9

Here is my little code:

var http = require('http');
var port = 9002;
var host_ip = '<my_ip>';
http.createServer(function (req, res) {
    var content = new Buffer("Hello 世界", "utf-8")
    console.log('request arrived');
    res.writeHead(200, {
        'Content-Encoding':'utf-8',
        'charset' : 'utf-8',
        'Content-Length': content.length,
        'Content-Type': 'text/plain'});
    res.end(content.toString('utf-8'),'utf-8');
}).listen(port, host_ip);
console.log('server running at http://' + host_ip + ':' + port);

Previously I just let res.end to send "hello world" and it worked well. Then I wanted to adjust a little bit and changed the 'world' into the Chinese equivalent '世界', and so changed the 'charset' 'content-type' in the header to 'utf-8'. But in Chrome and Firefox I see this:

hello 涓栫晫

However, amazingly opera(11.61) does show the correct result hello 世界. I want to know whether I have missed something in the code, and why this is happening. Thank you guys.

I think this post is similiar with my situation but not exactly.

Community
  • 1
  • 1
Allan Ruin
  • 4,669
  • 6
  • 32
  • 41

3 Answers3

14

Problem is with the character set specification. For me it works with this change:

'Content-Type': 'text/plain;charset=utf-8'

Tested with Chrome, Firefox and Safari.

You could also look into the node.js package "express" which allows rewriting your code like this:

var express=require('express');

var app=express.createServer();

app.get('/',function(req, res) {
    var content = "Hello 世界";

    res.charset = 'utf-8';
    res.contentType('text');
    res.send(content);
});

app.listen(9002);
jjrv
  • 3,927
  • 2
  • 36
  • 51
  • well, I know express and webjs , I just try to do same exercise and suddenly come across this strange problem~ :) – Allan Ruin May 06 '12 at 12:57
2

content-encoding is not a character set but a encoding of http response itself

charset is not a common http header

content-length is unneccesary here

as @jjrv said, you should write 'Content-Type': 'text/plain;charset=utf-8' there

alex
  • 10,773
  • 2
  • 27
  • 41
  • Your are right, fistly I thought as the manual say without `content- length` it will be a chunked packet and I guess it won't display the chinese charater correctly. But after add 'charset=utf-8' to `Content-Type`, I omitted the `content-length` field and it did work fine. – Allan Ruin May 06 '12 at 13:00
0

涓栫晫 is actually 世界 in encoding GB-18030, and then displayed as UTF-8. Probably the characters got saved in that encoding.

dda
  • 5,700
  • 2
  • 23
  • 33