4

I need extract start year and end year from a vector with values of these nature.

 yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")


 yr
 June 2013 – Present (2 years 9 months)
 January 2012 – June 2013 (1 year 6 months)
 2006 – Present (10 years)
 2002 – 2006 (4 years)

I am expecting output like this. Does anyone have suggestions?

 start_yr       end_yr

2013            2016
2012            2013
2006            2016
2002            2006
user3570187
  • 1,525
  • 1
  • 15
  • 29

2 Answers2

5
x <- gsub("present", "2016", yr, ignore.case = TRUE)
x <- regmatches(x, gregexpr("\\d{4}", x))
start_yr <- sapply(x, "[[", 1)
end_yr <- sapply(x, "[[", 2)

this saves the start year and end year in 2 separate variables, if you want them in one just edit the code and make y$start_yr y$end_yr

nrussell
  • 17,257
  • 4
  • 42
  • 56
Antoine
  • 76
  • 5
  • 1
    I have this thing called "character(0)" which is creeping up and getting this error "Error in FUN(X[[i]], ...) : subscript out of bounds". Any suggestions on removing that rows? – user3570187 Feb 29 '16 at 22:52
0

Another solution is to use the stringr package

library(stringr)
x <- str_replace(yr, "Present", 2016)
DF <- as.data.frame(str_extract_all(x, "\\d{4}", simplify = T))
names(DF) <- c("start_yr", "end_yr")
DF

and you will get

      start_yr end_yr
1     2013   2016
2     2012   2013
3     2006   2016
4     2002   2006
cderv
  • 4,972
  • 1
  • 16
  • 22