I have a list of files that is provided to me by a third party. I am trying to extract the age group name from each filename. Unfortunately, the third party has a poor and inconsistent naming convention for their files and I'm writing a larger piece of code that consumes these files. This age group string that I'm trying to extract always appears before the ".xls" file extension and follows either an underscore or a space. I have tried a number of different regular expressions to do this in R
, but I can't seem to figure this out (I'm not great with regex obviously).
age_group <- c("abc_July2018_Dec2018__state_1864.xls",
"def_July2018_Dec2018__state_65.xls",
"ghi July2018 Dec2018 state overall.xls")
The output I'm expecting is a vector containing: "1864", "65", "overall"
.
Can someone help me with the R regular expression to do extract these groups?