Single byte, Double byte, triple byte characters how to find which one is entered by user

Question

I was struggling on this whenever I had to work on multilingual site (esp. Japanese and Chinese) and users are allowed to enter characters in regional language.

score 0 · Answer 1 · answered Mar 10 '17 at 05:09

Characters can be single byte ,double byte, triple byte and so on. Single byte follows in a particular range. Same thing is true for other characters. Based on this I have created following functions that will calculate the size of a string on the basis of memory

function getByteLength(normal_val) {
    // Force string type
    normal_val = String(normal_val);

    var byteLen = 0;
    for (var i = 0; i < normal_val.length; i++) {
        var c = normal_val.charCodeAt(i);
        byteLen +=  c < (1 <<  7) ? 1 :
                c < (1 << 11) ? 2 :
                c < (1 << 16) ? 3 :
                c < (1 << 21) ? 4 :
                c < (1 << 26) ? 5 :
                c < (1 << 31) ? 6 : Number.NaN;
     }
     return byteLen;
}

So above function can be modified to find out whether a function is single byte or multi-bytes.

Following js fiddle determines the size of entered text in terms of memory.

http://jsfiddle.net/paraselixir/d83oaa3v/5/

so if string has x characters and memory size is y so if x === y then all characters are single bytes if 2*x === y then all characters are double bytes otherwise string is combination of single and double/multi bytes.

Single byte, Double byte, triple byte characters how to find which one is entered by user

1 Answers1