I have strings like these:
test <- c("oh i mean well i do n't know well he 's like oh",
"yeah so well he did n't say oh he said f** well you know what he 's like",
"oh you know well why well maybe he thought oh well good",
"oh my god well what the hell did he oh you know")
I'd like to match all word sequences starting with oh
and ending with well
and, the inverse, starting with well
and ending with oh
. This use of str_extract_all
does match some of the target sequences but not all, because it is unable to iteratively match, that is, it does not start anew from each oh
or well
once it has consumed it in a match:
library(stringr)
strings <- unlist(str_extract_all(test, "\\boh\\b.*?\\bwell\\b|\\bwell\\b.*?\\boh\\b"))
[1] "oh i mean well" "well he 's like oh" "well he did n't say oh" "oh you know well"
[5] "well maybe he thought oh" "oh my god well"
The complete output would be this:
[1] "oh i mean well" "well he 's like oh" "well he did n't say oh" "oh he said f** well"
[5] "oh you know well" "oh well" "well maybe he thought oh" "oh my god well"
[9] "well what the hell did he oh"