10

I’ve done some Google searches, but I get results related to encoding strings or files.

Can I write my Node.js JavaScript source code in UTF-8? Can I use non-ASCII characters in comments, strings, or as variable names?

ECMA-262 seems to require UTF-16 encoding, but Node.js won’t run a UTF-16 encoded .js file. It will, however run UTF-8 source and correctly interpret non-ASCII characters.

So is this by design or by “accident”? Is it specified somewhere that UTF-8 source code is supported?

Nate
  • 17,813
  • 6
  • 43
  • 52
  • 1
    I've never given this a second though, but I constantly use UTF-8 for everything I do and never had a problem. – Alex Turpin Apr 12 '12 at 14:05
  • 1
    I expect that it's not so much a Node.js thing, but a V8 thing. – Pointy Apr 12 '12 at 14:07
  • 1
    I was hoping someone could point to, say, Node.js or V8 documentation that says what source encodings are allowed. (Python example: http://www.python.org/dev/peps/pep-0263/). Yeah, I can and did futz around and see what works, but I want a more concrete answer. – Nate Apr 12 '12 at 15:12
  • You're linking to a very old version of the spec (3rd rev. is from 1999, we just hit 6th rev. last June). The current version is [here](http://www.ecma-international.org/ecma-262/6.0/index.html#sec-source-text). The requirement is "unicode" (with, by convention, ASCII being a subset of unicode, since the lower 127 codepoints in unicode are the same as the ASCII encoding specifies) – Mike 'Pomax' Kamermans Sep 11 '15 at 17:07

2 Answers2

0

Reference: http://mathiasbynens.be/notes/javascript-identifiers

UTF-8 characters are valid javascript variable names. Go ahead and encode UTF-8.

Joe Frambach
  • 25,568
  • 9
  • 65
  • 95
  • 3
    Unicode characters and UTF-8 encoding are different things. The standard actually seems to require UTF-16, not UTF-8 (but that doesn’t seem to be true in practice). It’s nice to have confirmation Unicode characters are valid variable names though. – Nate Apr 12 '12 at 15:14
  • 7
    Although available, I can't recommend doing things like `var Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠ = 'Zalgo';` – Joe Frambach Apr 12 '12 at 15:39
  • 4
    The standard says that the native text processing model of JavaScript is based on UTF-16 code units. That doesn't specify what byte-encoding is used to convert a source file to those units. – bobince Apr 14 '12 at 12:36
-1

I can't find documentation that says that Node treats files as encoded in UTF-8, but it seems that way experimentally:

/* Check in your editor that this Javascript file was saved in UTF-8 */
var nonEscaped = "Планета_Зямля";
var escaped = "\u041f\u043b\u0430\u043d\u0435\u0442\u0430\u005f\u0417\u044f\u043c\u043b\u044f";
if (nonEscaped === escaped) {
  console.log("They match");
}

The above example prints They match.

Non-BMP note:

Note that UTF-8 supports non-BMP code points (U+10000 and onwards), but Javascript has complications in that case, it automatically converts them to surrogate pairs. This is part of the language:

/* Check in your editor that this Javascript file was saved in UTF-8 */
var nonEscaped = ""; // U+1F4A9
var escaped1 = "\ud83d\udca9";
if (nonEscaped === escaped1) {
  console.log("They match");
}
/* Newer implementations support this syntax: */
var escaped2 = "\u{1f4a9}";
if (nonEscaped === escaped2) {
   console.log("The second string matches");
}

This prints They match and The second string matches.

Flimm
  • 97,949
  • 30
  • 201
  • 217