\\b
represents a word boundary. I don't understand why this operator has different effects depending on the character that follows. Example:
test1 <- 'aland islands'
test2 <- 'åland islands'
regex1 <- "[å|a]land islands"
regex2 <- "\\b[å|a]land islands"
grepl(regex1, test1, perl = TRUE)
[1] TRUE
grepl(regex2, test1, perl = TRUE)
[1] TRUE
grepl(regex1, test2, perl = TRUE)
[1] TRUE
grepl(regex2, test2, perl = TRUE)
[1] FALSE
This only seems to be an issue when perl = TRUE
:
grepl(regex1, test2, perl = FALSE)
[1] TRUE
grepl(regex2, test2, perl = FALSE)
[1] TRUE
Unfortunately, in my application, I absolutely need to keep perl=TRUE
.