(In R gsub(),) I need to capture the four words occurring after a particular phrase in a bigger string. Building on the wisdom offered here, I came up with: ^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$
For example:
this_txt <- "Blah blah particular phrase Extract These Words Please for the blah blah. Ignore blah this other stuff blah blah, blah."
this_pattern <- "^.*\\b(particular phrase)\\W+(\\w+\\W+\\w+\\W+\\w+\\W+\\w+).*$"
gsub(this_pattern, "\\2", this_txt, ignore.case = T)
# [1] "Extract These Words Please"
But the repetition of \\w+\\W+
in the pattern is pretty unseemly. Surely there is a better way. I thought ^.*\\b(particular phrase)\\W+(\\w+\\W+){4}.*$
might work, but it doesn't.