2

I'm looking for a way to display a paragraph of text and allow a user to click on a individual word to highlight it.

I found some code that will place a span around each word. And all of this works great, but...

When I have some text like say Soufflé it doesn't recognize it as one just one word.

Here is the code I'm using.

var p = $('#storyText');

p.html(function(index, oldHtml) {
    return oldHtml.replace(/\b([\w+-éñó\u00F1\u00E9]+)\b/g, '< span class="storyWord">$1< /span>');
})

It seems to work fine until I have an accent at the beginning or ending of a word.

Jared Farrish
  • 46,034
  • 16
  • 88
  • 98
Greg Mercer
  • 884
  • 1
  • 8
  • 10
  • Maybe refer back to where you got the regex? Really now, if this isn't begging for "fix it for me", I don't know what would qualify. This is "challenging" stuff. – Jared Farrish May 27 '12 at 01:21
  • 1
    Take a look at [this](http://stackoverflow.com/questions/5436824/matching-accented-characters-with-javascript-regexes). It may help, or at least explain the issue. Essentially \b doesn't work as you might expect with unicode characters. – Mark M May 27 '12 at 01:26
  • ok, jared, fair enough.it doesn't seem like there's any easy to handle unicode regex in javascript. there's an answer here that mentions a jquery library that might help: http://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters – Ringo May 27 '12 at 01:48

1 Answers1

1

The problem is that word characters in a javascript RegExp are defined simply as [a-zA-z0-9_], so \b matches the "boundary" between them and an accented character. Matching the words directly is an improvement:

$.fn.highlightWords = function(){
  return this.html(function(i, html){
    return html.replace(/([\w'+-éñó\u00F1\u00E9])+/g, function(m){
        return '<span>'+m+'</span>'
    })
  })
}

$('p').highlightWords()

Demo: http://jsbin.com/elegod

Ricardo Tomasi
  • 31,690
  • 2
  • 52
  • 65