2

Edit 2:

const tamilRegex = XRegExp("\\p{Tamil}", "ug")
const match = XRegExp.exec(word, tamilRegex);
return match

Now, I found XRegExp a library which can handle unicode characters. The above code is the one I tried using that library still it returns wrong value.

Any help?!


Edit 1:

const word = "யாத்திராகமம்"
const firstLetter = word.match(/[^\w]/u)

console.log(firstLetter)

The above code returns which is not the correct first tamil letter in that word, instead it should be யா.

Any way to get the proper first letter in a word using regex or any other library?

  • 2
    Find list of letters in tamil unicode then compare with them using a loop. – Muje Feb 29 '20 at 07:14
  • 2
    `word.match(/^p{L}/u)`, check this library: http://xregexp.com/ – Alex Feb 29 '20 at 07:26
  • @James_22 Yes, you are correct. But that would be reinventing the wheels. I hope there would be a solution already. If not will go with your solution. Thanks for your response. – Jeffrin Prabahar Feb 29 '20 at 07:30
  • @Alex I have updated my question. Can you please check that again. I tried XRegExp library but I don't know how to properly write the regex for it. Can you help? – Jeffrin Prabahar Feb 29 '20 at 07:31
  • 4
    Actually the problem is that the first letter is ய, not யா. யா is a syllable composed of a consonant ய் and vowel ஆ : https://fr.wiktionary.org/wiki/%E0%AE%AF%E0%AE%BE – Alex Feb 29 '20 at 07:45
  • @Alex Do you say that the word `யாத்திராகமம்` has the first letter as `ய` and not `யா`? I am sorry, I couldn't understand what you are trying to say?!. But the answer shared by @trincot worked for me. Thank you so much for your time & response @Alex. I really appreciate. – Jeffrin Prabahar Feb 29 '20 at 11:14

1 Answers1

2

I don't know the Tamil script, but Wikipedia explains the concept of compound letters in that script. The Tamil Unicode Block has characters in the range U+0B80 to U+0BFF, of which the subrange U+0BBE-U+0BCD, and one at U+0BD7 are suffixes that need to be combined with the preceding consonant to make it a compound letter.

Without any specialised library or smarter regex support, it seems you can make it work with the regex [\u0b80-\u0bff][\u0bbe-\u0bcd\u0bd7]?, which matches a character in the Tamil range, and in addition possibly one of those suffix codes.

let s = "this is Tamil: யாத்திராகமம்";

console.log("First Tamil character: ", s.match(/[\u0b80-\u0bff][\u0bbe-\u0bcd\u0bd7]?/u));
trincot
  • 211,288
  • 25
  • 175
  • 211