0

I have an input field and I want to validate that the input is valid for multiple languages (and digits).

I've gathered some bits from the web but couldn't figure out how to combine these pieces into one working regex.

From here I found some of the ranges I need:

0000-007F   Basic Latin
0080-00FF   Latin-1 Supplement
0100-017F   Latin Extended-A
0180-024F   Latin Extended-B

From here I found the Japanese range:

4e00-9fbf, 3040-309f and 30a0-30ff

But how do I combine them to one regex (including digits) in javascript so I can validate that they are the only characters allowed? (I need more languages but if I need to understand the concept and then I can add more unicode ranges myself)

Alon
  • 7,298
  • 16
  • 55
  • 94
  • I think instead of using regex, you can check char by char if any of them are out of range or not. – Snow Blind Jul 29 '13 at 07:29
  • 1. how do I do it? 2. Isn't it wrong practice to make validation char by char and not using regex? – Alon Jul 29 '13 at 07:31

1 Answers1

2

There's regex category \p{L} for you, which matches letter from all known language. But sadly JavaScript's built-in RegExp dosen't support it. Instead, you can consider using XRegExp with Unicode Base plugin.

<script src="xregexp.js"></script>
<script src="addons/unicode/unicode-base.js"></script>
<script>
  var unicodeWord = XRegExp("^\\p{L}+$");

  unicodeWord.test("Русский"); // true
  unicodeWord.test("日本語"); // true
  unicodeWord.test("العربية"); // true
</script>

Code snippet from http://xregexp.com/plugins

Community
  • 1
  • 1
Mics
  • 1,390
  • 14
  • 19
  • I've seen this library but got confused on how do I concatenate several languages (german, french, spanish, korean, japanese etc) and digits. How would I do such a thing? – Alon Jul 29 '13 at 07:57