3

I'm able to match and highlight this Hebrew letter in JS:

var myText = $('#text').html();
var myHilite = myText.replace(/(\u05D0+)/g,"<span class='highlight'>$1</span>");
$('#text').html(myHilite);

fiddle

but can't highlight a word containing that letter at a word boundary:

/(\u05D0)\b/g

fiddle

I know that JS is bad at regex with Unicode (and server side is preferred), but I also know that I'm bad at regex. Is this a limit in JS or an error in my syntax?

Community
  • 1
  • 1
nathanbweb
  • 707
  • 1
  • 11
  • 24

2 Answers2

2

I can't read Hebrew... does this regex do what you want?

/(\S*[\u05D0]+\S*)/g

Your first regex, /(\u05D0+)/g matches on only the character you are interested in.

Your second regex, /(\u05D0)\b/g, matches only when the character you are interested in is the last-only (or last-repeated) character before a word boundary...so that doesn't won't match that character in the beginning or middle of a word.

EDIT:

Look at this anwer

utf-8 word boundary regex in javascript

Using the info from that answer, I come up with this regex, is this correct?

/([\u05D0])(?=\s|$)/g

Community
  • 1
  • 1
mrk
  • 4,904
  • 3
  • 24
  • 41
  • seems to match any word containing that letter at any position: http://jsfiddle.net/nathanbweb/mf9Su/4/ -- but basically you're trying to use \S instead of \b ? – nathanbweb Feb 26 '13 at 18:09
  • do you want to match that character only as the last character in a word, or do you also want to match it at the beginning of a word boundary? – mrk Feb 26 '13 at 18:10
  • I think only as the last character in a word. I'm not clear on the difference (again I suck at regex). – nathanbweb Feb 26 '13 at 18:11
  • re:edit - that matches the letter at the location I want. What I'm hoping for is to match every word in which that letter is at that location .. does that make sense? i'll check out that q&a .. – nathanbweb Feb 26 '13 at 18:19
  • 1
    As `\u05D0` is considered a non-word character there will not be a word boundary if it is followed by another non-word character such as a space. For there to be a word boundary after `\u05D0`, it would have to be followed by a word character `[a-zA-Z0-9_]`. – MikeM Feb 26 '13 at 19:29
0

What about using the following regexp which uses all cases of a word in a sentence:

/^u05D0\s|\u05D0$|\u05D0\s|^\u05D0$/

it actually uses 4 regexps with the OR operator ('|').

  1. Either the string starts with your exact word followed by a space
  2. OR your string has space + your word + space
  3. OR your string ends with space + your word
  4. OR your string is the exact word only.
TBE
  • 885
  • 1
  • 9
  • 29