What encoding is expected for Node.js source code?

Question

I’ve done some Google searches, but I get results related to encoding strings or files.

Can I write my Node.js JavaScript source code in UTF-8? Can I use non-ASCII characters in comments, strings, or as variable names?

ECMA-262 seems to require UTF-16 encoding, but Node.js won’t run a UTF-16 encoded .js file. It will, however run UTF-8 source and correctly interpret non-ASCII characters.

So is this by design or by “accident”? Is it specified somewhere that UTF-8 source code is supported?

I've never given this a second though, but I constantly use UTF-8 for everything I do and never had a problem. — Alex Turpin, Apr 12 '12 at 14:05
I expect that it's not so much a Node.js thing, but a V8 thing. — Pointy, Apr 12 '12 at 14:07
I was hoping someone could point to, say, Node.js or V8 documentation that says what source encodings are allowed. (Python example: http://www.python.org/dev/peps/pep-0263/). Yeah, I can and did futz around and see what works, but I want a more concrete answer. — Nate, Apr 12 '12 at 15:12
You're linking to a very old version of the spec (3rd rev. is from 1999, we just hit 6th rev. last June). The current version is [here](http://www.ecma-international.org/ecma-262/6.0/index.html#sec-source-text). The requirement is "unicode" (with, by convention, ASCII being a subset of unicode, since the lower 127 codepoints in unicode are the same as the ASCII encoding specifies) — Mike 'Pomax' Kamermans, Sep 11 '15 at 17:07

score 0 · Answer 1 · answered Apr 12 '12 at 14:14

0

Reference: http://mathiasbynens.be/notes/javascript-identifiers

UTF-8 characters are valid javascript variable names. Go ahead and encode UTF-8.

answered Apr 12 '12 at 14:14

Joe Frambach

25,568
9
65
95

3

Unicode characters and UTF-8 encoding are different things. The standard actually seems to require UTF-16, not UTF-8 (but that doesn’t seem to be true in practice). It’s nice to have confirmation Unicode characters are valid variable names though. – Nate Apr 12 '12 at 15:14
7

Although available, I can't recommend doing things like `var Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠ = 'Zalgo';` – Joe Frambach Apr 12 '12 at 15:39
4

The standard says that the native text processing model of JavaScript is based on UTF-16 code units. That doesn't specify what byte-encoding is used to convert a source file to those units. – bobince Apr 14 '12 at 12:36

score -1 · Answer 2 · answered Sep 11 '15 at 16:58

I can't find documentation that says that Node treats files as encoded in UTF-8, but it seems that way experimentally:

/* Check in your editor that this Javascript file was saved in UTF-8 */
var nonEscaped = "Планета_Зямля";
var escaped = "\u041f\u043b\u0430\u043d\u0435\u0442\u0430\u005f\u0417\u044f\u043c\u043b\u044f";
if (nonEscaped === escaped) {
  console.log("They match");
}

The above example prints They match.

Non-BMP note:

Note that UTF-8 supports non-BMP code points (U+10000 and onwards), but Javascript has complications in that case, it automatically converts them to surrogate pairs. This is part of the language:

/* Check in your editor that this Javascript file was saved in UTF-8 */
var nonEscaped = ""; // U+1F4A9
var escaped1 = "\ud83d\udca9";
if (nonEscaped === escaped1) {
  console.log("They match");
}
/* Newer implementations support this syntax: */
var escaped2 = "\u{1f4a9}";
if (nonEscaped === escaped2) {
   console.log("The second string matches");
}

This prints They match and The second string matches.

What encoding is expected for Node.js source code?

2 Answers2

Non-BMP note:

Linked