3

My application was relying on this function to test if a string is Korean or not :

const isKoreanWord = (input) => {
  const match = input.match(/[\u3131-\uD79D]/g);
  return match ? match.length === input.length : false;
}

isKoreanWord('만두'); // true
isKoreanWord('mandu'); // false

until I started to include Chinese support and now this function is incoherent :

isKoreanWord('幹嘛'); // true

I believe this is caused by the fact that Korean characters and Chinese ones are intermingled into the same Unicode range.

How should I correct this function to make it returns true if the input contains only Korean characters ?

vdegenne
  • 8,291
  • 10
  • 65
  • 90
  • 1
    By "Korean characters" you mean *hangul*? 'Cause Chinese characters are also used in Korea. Asking to distinguish "Chinese Chinese characters" from "Korean Chinese characters" is like asking to distinguish English from French. – deceze Oct 25 '18 at 12:33
  • @deceze Yes I meant *hangul*. How to distinguish between *hangul* and *hanja*. – vdegenne Oct 25 '18 at 12:34
  • @deceze Also I don't think your comparison is true in that English and French derive from Latin so yes it is extremely hard to compare both language, while Korean is using Chinese as its base language and Chinese, well... is using Chinese as its own historical base language. – vdegenne Oct 25 '18 at 12:40
  • 1
    I'm talking purely about the *writing system* used. If you just look at the range of letters, English is indistinguishable from French. In the same way, seeing just a few Chinese characters it's virtually impossible to tell whether it's a Chinese word or a word used in the context of Korean. – deceze Oct 25 '18 at 12:43
  • 1
    "Korean characters" means hangul, there's no exception. – wonsuc Mar 26 '19 at 06:59
  • @wonsuc yes when you see hangul you know it's Korean, and when you see a Chinese character you know it's Chinese. Even in the context of Korean a Chinese character is always Chinese from its core. Not sure why deceze was trying to argue about that. – vdegenne Mar 31 '19 at 12:23

1 Answers1

10

Here is the unicode range you need for Hangul (Taken from their wikipedia page).

U+AC00–U+D7AF
U+1100–U+11FF
U+3130–U+318F
U+A960–U+A97F
U+D7B0–U+D7FF

So your regex .match should look like this:

const match = input.match(/[\uac00-\ud7af]|[\u1100-\u11ff]|[\u3130-\u318f]|[\ua960-\ua97f]|[\ud7b0-\ud7ff]/g);
deceze
  • 471,072
  • 76
  • 664
  • 811
Jim
  • 1,676
  • 2
  • 13
  • 21