4

How to find the first Chinese character in a java string example:

String str = "xing 杨某某";

How can i get the index of first Chinese character 杨 in above str. Thanks!

STF
  • 1,437
  • 3
  • 17
  • 32
star
  • 139
  • 11
  • It doesn't solve your problem completely, but you can iterate over the characters, check those that are non-ascii and see if these fall within ranges described here: http://stackoverflow.com/a/11415841/1789436, but CJK characters are not just chinese ones. Is that close enough for your case? – CptBartender May 25 '16 at 08:05
  • Thank for your advice! – star May 29 '16 at 06:43

2 Answers2

10

This could help:

public static int firstChineseChar(String s) {
    for (int i = 0; i < s.length(); ) {
        int index = i;
        int codepoint = s.codePointAt(i);
        i += Character.charCount(codepoint);
        if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {
            return index;
        }
    }
    return -1;
}
Andrea
  • 5,756
  • 1
  • 28
  • 51
  • OP asked for the index, not the character. But this would be a minor change – king_nak May 25 '16 at 08:12
  • 1
    Fantastic. Could you please explain a little about the magic behind the code? – Sayakiss May 26 '16 at 03:58
  • @Sayakiss: Sure, but there's nothing "magic", here! :) at each iteration, it checks for the Unicode value of the current character, then (using the Character.UnicodeScript - https://docs.oracle.com/javase/7/docs/api/java/lang/Character.UnicodeScript.html), it checks if that Unicode value is in the HAN set. – Andrea May 26 '16 at 07:37
2

In your case, you have four ascii characters and then you have the other chinese character, so you can check, using a for loop, when a char is no more an ascii.

So, if char is different from ascii, then is, in this case I mean, a chinese character.

for(int i = 0; i < str.length(); i++) {
    char c = str.charAt(i);
    int char = (int) c;
    if(char < 0 || char > 255) // Is not ascii
    System.out.println("The first chinese char is: " + str.charAt(i);
 }
Anirudh Sharma
  • 7,808
  • 13
  • 36
  • 40
Simone C.
  • 339
  • 1
  • 13