Chinese Sorting by Pinyin in Javascript with localeCompare?

Question

I am facing quite a challenge here. I am to sort certain Chinese "expressions" by pinyin.

The question:
How could I sort by pinyin in Firefox?
Is there a way to sort properly in IE 9 and 10? (They are also to be supported by the website)

Example:

财经传讯公司
财经顾问
房地产及按揭

According to a translator agency, this is what the sort order of the words should be. The translations are as follows:

Financial communication agencies
Financial consultancies
Real estate and mortgages

The pronanciations in latin alphabet:

cai　jing　chuan　xun　gong　si
cai　jing　gu　wen
fang　di　chan　ji　an　jie

String.localeCompare: MDN Docs

From what I understand I am to provide a 2nd argument to the String.localeCompare method that "tells" the method to sort by pinyin in BCP 47 format which should be zh-CN-u-co-pinyin.

So the full code should look like this:

var arr = [ "财经传讯公司", "财经顾问", "房地产及按揭"];
console.dir(arr.sort(function(a, b){
    return a.localeCompare(b, [ "zh-CN-u-co-pinyin" ]); 
}));

jsFiddle working example

I expected this to log to console the expressions in the order I entered them in the array but the output differs.

On FX 27, the order is: 3, 1, 2
In Chrome 33: 1, 2, 3
In IE 11: 1, 2, 3

Note:

Pinyin is the official phonetic system for transcribing the Mandarin pronunciations of Chinese characters into the Latin alphabet.

I wouldn't expect localeCompare() transliterates to Pinyin, What I'd expect is it performs comparison assuming input text IS Pinyin. BTW it's supported in FF starting from 29 (so it won't work in 27). — Adriano Repetti, Apr 07 '14 at 08:36
Indeed! I missed the compatibility table. I was too "used to" FX having the features on MDN :) Naiv mistake — Daniel V., Apr 07 '14 at 08:55
Here there is the localCompare MDC documentation https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare — sharkbait, Apr 07 '14 at 08:59

score 12 · Answer 1 · answered Jan 17 '17 at 09:01

12

This works on Chrome:

const arr = ["博","啊","吃","世","中","超"]
arr.sort((x,y)=>x.localeCompare(y, 'zh-CN'))

answered Jan 17 '17 at 09:01

soulmachine

3,046
3
37
52

score 2 · Answer 2 · edited Jun 16 '15 at 07:55

In general, people will use the following method for Chinese characters pinyin sort

var list=[' king ', 'a', 'li'];  
list.Sort(function (a, b) {return a.localeCompare(b); });

localeCompare () : with local specific order to compare two strings.

This approach to pinyin sort is unreliable.

Second way: very dependent on Chinese operating system

Is very dependent on the browser kernel that is to say, if your site visitors are through the Chinese system, or the Internet explorer browser (Chrome), then he will probably unable to see the pinyin sort the result we expected.

Here I'll introduce my solution to this problem, hope to be able to derive somehow: this method supports the Unicode character set x4e00 from 0 to 0 x9fa5 area a total of 20902 consecutive from China (including Taiwan), Japan, South Korea, Chinese characters, namely, CJK (Chinese Japanese Korean) characters.

var CompareStrings={.........}
getOrderedUnicode: function (char) {
var originalUnicode=char.charCodeAt (); 
if (originalUnicode >=0 x4e00 && originalUnicode <=0 x9fa5) {
var index=this.Db.IndexOf (char); 
if (index >1) {
return index + 0 x4e00; 

}} 
return originalUnicode; 
}, 


compare: function (a, b) {
if (a==b) {return 0; }

//here can be rewritten according to the specific needs and the writing is the empty string at the bottom the if (a.length==0) {return 1; } 

if (b.length==0) {return - 1; } 
var count=a.length >B.length? B.length: a.length; 

for (var i=0; i<count; i++) {
var au=this.GetOrderedUnicode (a [i]); 
var bu=this.GetOrderedUnicode [i] (b); 
if (au >bu) {
return 1; 
} else if (au <bu) {
return - 1; 
}} 

return a.length >B.length? 1:1; 

}} 
//rewriting system native localeCompare

The prototype:

LocaleCompare = function (param) {
    return CompareStrings.compare said (enclosing the toString (), param); 
}

you can through the links below to download the complete code

A brief introduction of the principle of implementation:

According to pinyin sort good character (db) : there are multiple ways to achieve a goal, I am done with JavaScript + c# combination, use the script first put all the enumeration of Chinese characters, and then submitted to the c #good background sort, and output to the front desk, this is just the preparation, what all can.
Identify two characters who is bigger (getOrderedUnicode) : because when ordering, not only to deal with Chinese characters, and Chinese characters outside of the characters, so the comparator must be able to identify all of the characters, we here by judging whether a character is to discriminate Chinese characters: if it is Chinese characters, then the sort good word library search index, the index value plus the Unicode character set the location of the first Chinese characters, is after the "calibration" of the Unicode character set of the index value; If not Chinese characters, then return it directly on the index value of the Unicode character set.
Compare two strings (compare) : by comparing two each of the characters (within the effective range comparison, that is, the shorter the length of the string), if you find a greater than b, it returns 1, vice return 1.
Within the effective range after the comparison if haven't the tie, just see who is longer, such as a='123', b='1234', so long b to row in the back.

EDIT

You can also use JQuery plugin:

jQuery.extend( jQuery.fn.dataTableExt.oSort, {
    "chinese-string-asc" : function (s1, s2) {
        return s1.localeCompare(s2);
    },
    "chinese-string-desc" : function (s1, s2) {
        return s2.localeCompare(s1);
    }
} );

See the original post.

_"This approach to pinyin sort is unreliable."_ Assuming Pinyin input text (as in your example) can you **explain** why it's unreliable? (of course for browser that supports it) — Adriano Repetti, Apr 07 '14 at 08:53
This method is too tied to the type of browser used by the user or the operating system that runs on the machine. — sharkbait, Apr 07 '14 at 08:57
You can found here http://www.datatables.net/forums/discussion/9700/sorting-non-ascii-characters-and-data-content-html-tag-sorting/p1 some notes about localCompare and IE9 for example.... — sharkbait, Apr 07 '14 at 09:01
Assuming browser support **it is reliable** and absolutely unrelated to underlying operating system. In your last edit you posted a snippet from DataTable sorting plug-in, it works pretty well (please add reference to original author and source code, that code alone is useless). — Adriano Repetti, Apr 07 '14 at 09:04
I want say browsers for exploring the internet like Chrome... sorry for my english — sharkbait, Apr 07 '14 at 09:06
Moreover...a dictionary (!!!) could be a solution only if you can't use anything else. Anyway I would add proper attribution to original author too: http://www.script-home.com/javascript-implementation-method-of-pinyin.html — Adriano Repetti, Apr 07 '14 at 09:10
Sorry teacher.... you don't think you're just a bit acid?!?! Anyway... I only tried to help the questioner.... I'm not here for competition with you... think what you want.... — sharkbait, Apr 07 '14 at 09:11
Sorry if I did seem acid. **Discussion is**, usually, a **good** way to improve an answer (so it'll help **future** readers too) and to help everyone (both me and you) to understand the problem. I know I stressed a little about references but it's nice for original authors... — Adriano Repetti, Apr 07 '14 at 09:15

score 1 · Answer 3 · answered Feb 11 '16 at 09:08

1

According to MDN, locales and options arguments in localeCompare() have been added in Firefox 29. You should be able to sort by pinyin now.

answered Feb 11 '16 at 09:08

Xhacker Liu

1,503
1
15
25

score 0 · Answer 4 · answered Aug 16 '17 at 02:29

Here is a solution:

<!--
pinyin_dict_notone.js and pinyinUtil.js is available in URL below:
https://github.com/sxei/pinyinjs
-->
<script src="pinyin_dict_notone.js"></script>
<script src="pinyinUtil.js"></script>
<script>
jQuery.extend(jQuery.fn.dataTableExt.oSort, {
  "chinese-string-asc": function(s1, s2) {
    s1 = pinyinUtil.getPinyin(s1);
    s2 = pinyinUtil.getPinyin(s2);
    return s1.localeCompare(s2);
  },
  "chinese-string-desc": function(s1, s2) {
    s1 = pinyinUtil.getPinyin(s1);
    s2 = pinyinUtil.getPinyin(s2);
    return s2.localeCompare(s1);
  }
});
jQuery(document).ready(function() {
  jQuery('#mydatatable').dataTable({
    "columnDefs": [
      { type: 'chinese-string', targets: 0 }
    ]
  });
});
</script>

Chinese Sorting by Pinyin in Javascript with localeCompare?

4 Answers4