match hebrew character at word boundary via regex in javascript?

Question

I'm able to match and highlight this Hebrew letter in JS:

var myText = $('#text').html();
var myHilite = myText.replace(/(\u05D0+)/g,"<span class='highlight'>$1</span>");
$('#text').html(myHilite);

fiddle

but can't highlight a word containing that letter at a word boundary:

/(\u05D0)\b/g

fiddle

I know that JS is bad at regex with Unicode (and server side is preferred), but I also know that I'm bad at regex. Is this a limit in JS or an error in my syntax?

score 2 · Accepted Answer · edited May 23 '17 at 10:28

2

I can't read Hebrew... does this regex do what you want?

/(\S*[\u05D0]+\S*)/g

Your first regex, /(\u05D0+)/g matches on only the character you are interested in.

Your second regex, /(\u05D0)\b/g, matches only when the character you are interested in is the last-only (or last-repeated) character before a word boundary...so that doesn't won't match that character in the beginning or middle of a word.

EDIT:

Look at this anwer

utf-8 word boundary regex in javascript

Using the info from that answer, I come up with this regex, is this correct?

/([\u05D0])(?=\s|$)/g

edited May 23 '17 at 10:28

Community

1
1

answered Feb 26 '13 at 18:04

mrk

4,904
3
24
41

seems to match any word containing that letter at any position: http://jsfiddle.net/nathanbweb/mf9Su/4/ -- but basically you're trying to use \S instead of \b ? – nathanbweb Feb 26 '13 at 18:09
do you want to match that character only as the last character in a word, or do you also want to match it at the beginning of a word boundary? – mrk Feb 26 '13 at 18:10
I think only as the last character in a word. I'm not clear on the difference (again I suck at regex). – nathanbweb Feb 26 '13 at 18:11
re:edit - that matches the letter at the location I want. What I'm hoping for is to match every word in which that letter is at that location .. does that make sense? i'll check out that q&a .. – nathanbweb Feb 26 '13 at 18:19
1

As `\u05D0` is considered a non-word character there will not be a word boundary if it is followed by another non-word character such as a space. For there to be a word boundary after `\u05D0`, it would have to be followed by a word character `[a-zA-Z0-9_]`. – MikeM Feb 26 '13 at 19:29

score 0 · Answer 2 · answered Mar 13 '16 at 11:25

What about using the following regexp which uses all cases of a word in a sentence:

/^u05D0\s|\u05D0$|\u05D0\s|^\u05D0$/

it actually uses 4 regexps with the OR operator ('|').

Either the string starts with your exact word followed by a space
OR your string has space + your word + space
OR your string ends with space + your word
OR your string is the exact word only.

match hebrew character at word boundary via regex in javascript?

2 Answers2