I'm trying to convert characters to dates. The dates have heterogeneous formats, and I'd prefer to not have to code each date format separately (and specify which formats belong to which elements, through, e.g., grepl
).
Here's my test data:
test <- c(
"2012-11-11", "12-5-23", "12/5/86", "2015-12-16 1300",
"8/6/92 3:00", "11/6/14 4", "10/31/14 52",
"06/15/2014 14:37", "2/10/06", "95-06-26", "82-10-03"
)
Desired result:
as.POSIXct(c("2012-11-11 00:00:00 UTC", "2012-05-23 00:00:00 UTC", "1986-12-05 00:00:00 UTC", "2015-12-16 13:00:00 UTC", "1992-08-06 03:00:00 UTC", "2011-06-14 04:00:00 UTC", "2014-10-31 00:52:00 UTC", "2014-06-15 14:37:00 UTC", "2006-02-10 00:00:00 UTC", "1995-06-26 00:00:00 UTC", "1982-10-03 00:00:00 UTC"), tz="UTC")
I realize that some of the weird time in test
(like 4, or 53) might be impossible unequivocally parse, but for those I mostly want to make sure that the random time doesn't screw up the date.
Here's my best attempt:
orders <- paste(rep(c("ymd", "mdy", "Ymd"),each=3), c("HM","H","M"))
lubridate::parse_date_time(test, orders=orders, truncated=2)
[1] "2012-11-11 00:00:00 UTC" "0012-05-23 00:00:00 UTC" "1986-12-05 00:00:00 UTC" "2015-12-16 13:00:00 UTC" "0092-08-06 03:00:00 UTC" "2011-06-14 04:00:00 UTC"
[7] "2014-10-31 00:52:00 UTC" "2014-06-15 14:37:00 UTC" "2006-02-10 00:00:00 UTC" "0095-06-26 00:00:00 UTC" "0082-10-03 00:00:00 UTC"
The problem is that it gets the century wrong on the 2-digit years. What surprising is that this works fine:
parse_date_time(test[2], orders=orders[1], truncated=2)
[1] "2012-05-23 UTC"
In ?parse_date_time
there is a note:
NOTE: ymd family of functions are based on strptime which currently fails to parse %y-%m formats.
But that shouldn't apply here because 1) I have day and 2) it seems to work when I only have a date with a year w/o century. I think the problem in my specification is in how it interacts with the guessing/ training.
Lubridate is so close to making this tiresome task a lot easier, I think. Is there a way to get lubridate
, or any other method, to generically parse the dates in test
?