I have a text where I would like to find different words. The following text is in Portuguese, Brazil, and serves only as a test case:
Um dia eu conheci Pedro Álvares Cabral, e descobri muitas informações interessantes.
To find any of the words in the text, I am using the following regular expression:
/\b(Cabral)\b/i // Finds Cabral
/\b(dia)\b/i // Finds dia
/\b(Pedro)\b/i // Finds Pedro
Etc...
If I need to find more than one word, I do as follows:
/\b(informações|muitas)\b/ig
I am testing the functionality of the expression in both JavaScript and using this online utility. JavaScript code example:
var input = "Um dia eu conheci Pedro Álvares Cabral, e descobri muitas informações interessantes."
var matchRegExp = new RegExp("\\b(coNHECi)\\b", "i");
if(regs = matchRegExp.exec(input)) {
console.log('OK');
}
else {
console.log('NOPE');
}
THE PROBLEM
All the words I put into the expression are found, except Álvares
. For example, I cannot find the word with the following expression:
/\b(Álvares)\b/i
If I remove the Á
character, lvares
is found. I would like to:
- To know why and for what reason I can't find
Álvares
. - To know how I can find any word in a text that has the following characters: áàâãÁÀÂÃéèêÉÈÊíìîÍÌÎóòôõÓÒÔÕúùûÚÙÛñÑçÇ regardless of whether these characters represent the first, last, or any letter of a word.