1

i have a dataframe with dates in different formats. want to first only remove 19 in the year of those that have it and use as.POSIXlt to convert the dates. but the gsub isn't picking up

df.DOB <- c("12/11/99", "10/24/67", "8/18/13", "2/29/45", "2/28/63", "12/14/77",
            "07/25/1923", "01/07/1989", "09/02/1974")

gsub("\\/19.*", "", df.DOB)

# [1] "12/11/99" "10/24/67" "8/18/13"  "2/29/45"  "2/28/63"  "12/14/77" "07/25"  "01/07" "09/02" 

df.DOB.formatted <- as.POSIXlt(df.DOB, format = "%m/%d/%y")
df.DOB.formatted <- df.DOB.formatted - 100L
df.DOB.formatted

# [1] "1999-12-10 23:58:20 EST" "2067-10-23 23:58:20 EDT" "2013-08-17 23:58:20 EDT"
# [4] NA                        "2063-02-27 23:58:20 EST" "1977-12-13 23:58:20 EST"
# [7] "2019-07-24 23:58:20 EDT" "2019-01-06 23:58:20 EST" "2019-09-01 23:58:20 EDT"

would be grateful for your help

thanks

sahuno
  • 47
  • 5
  • How do we know that `8/18/13` should be interpreted as `08/18/1913` and not `08/18/2013`? There are many edge cases which you need to clarify here. – Tim Biegeleisen Aug 20 '20 at 04:06
  • What is your final expected output? Maybe there is a better way here which does not involve `gsub` step at all. – Ronak Shah Aug 20 '20 at 04:08
  • first I'm expecting > gsub("\\/19.*","",df.DOB) [1] "12/11/99" "10/24/67" "8/18/13" "2/29/45" "2/28/63" "12/14/77" "07/25/23" "01/07/89" [9] "09/02/74" – sahuno Aug 20 '20 at 04:12

3 Answers3

2

I added an extra entry in df.DOB to have 19 as date.

You can use sub to remove "19" which is followed by two characters.

df.DOB <- c("12/11/99","10/24/67","07/25/1923", "01/07/1989",
             "09/02/1974","01/19/1987")
sub('19(?=..$)', '', df.DOB, perl = TRUE)
#[1] "12/11/99" "10/24/67" "07/25/23" "01/07/89" "09/02/74" "01/19/87"
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • Yes! seem more robust! could please explain a little bit `19(?=..$)` part? – sahuno Aug 20 '20 at 04:38
  • `?=` is a positive lookahead regex which will remove 19 only if it is followed by two characters after it which are at the end of the string. – Ronak Shah Aug 20 '20 at 04:48
  • @sahuno Glad to have been of help! Feel free to [accept one of the answer](https://stackoverflow.com/help/someone-answers) which worked best for you by clicking on check mark next to vote button :-) You can accept only one answer per post. – Ronak Shah Aug 20 '20 at 05:19
1

You can use str_replace.

 library(stringr)
 df.DOB <- c("12/11/99","10/24/67","8/18/13","2/29/45","2/28/63","12/14/77", 
        "07/25/1923","01/07/1989","09/02/1974")

 str_replace(df.DOB, "19", "")
 # if you have 19 in other parts
 str_replace(df.DOB, "19(?=..$)", "") # From Ronak and Darren comments

Another solution is that you can separate month and year and apply replacement only on year (thanks all for your comments on my answer):

df.DOB <- c("12/19/1999","10/24/67","8/19/13","2/29/45","2/28/63","12/14/77", 
           "07/25/1923","01/07/1989","09/02/1974")

df1 = str_split(df.DOB, "/", simplify = TRUE) 
df1[,3] = str_replace(df1[,3], "19", "")
apply(df1,1,function(d) paste(d,collapse = "/"))
Sam S
  • 329
  • 1
  • 15
1

Another regex pattern:

df.DOB <- c("12/11/99", "10/24/67", "07/25/1923", "01/07/1989", "09/02/1974", "01/19/1987")

sub("19(..)$", "\\1", df.DOB)

# [1] "12/11/99" "10/24/67" "07/25/23" "01/07/89" "09/02/74" "01/19/87"
Darren Tsai
  • 12,578
  • 3
  • 13
  • 38