41

I have a data frame with a character column of dates.

When I use as.Date, most of my dates are parsed correctly, except for a few instances. The example below will hopefully show you what is going on.

# my attempt to parse the string to Date -- uses the stringr package
prods.all$Date2 <- as.Date(str_sub(prods.all$Date, 1, 
                str_locate(prods.all$Date, " ")[1]-1), 
                "%m/%d/%Y")

# grab two rows to highlight my issue
temp <- prods.all[c(1925:1926), c(1,8)]
temp
#                    Date      Date2
# 1925  10/9/2009 0:00:00 2009-10-09
# 1926 10/15/2009 0:00:00 0200-10-15

As you can see, the year of some of the dates is inaccurate. The pattern seems to occur when the day is double digit.

Any help you can provide will be greatly appreciated.

Henrik
  • 56,228
  • 12
  • 124
  • 139
Btibert3
  • 34,187
  • 40
  • 119
  • 164
  • The reason you are getting the invalid 0200 date is that the character lengths of the day are different (two digits for 15-Oct, one digit for 9-Oct) - and your string substitute code is not accounting for that. At any rate you can probably use as.Date or strptime directly with the format agument, without processing the characters further. – mdsumner Nov 30 '10 at 04:21

3 Answers3

73

The easiest way is to use lubridate:

library(lubridate)
prods.all$Date2 <- mdy(prods.all$Date2)

This function automatically returns objects of class POSIXct and will work with either factors or characters.

hadley
  • 94,313
  • 27
  • 170
  • 239
  • 11
    I will mention the existence of things like ymd(), ymd_hms(), myd_hms(), etc. in that library to handle date and time fields together. Awesome library btw. My hats off to you... – Mike Wise Mar 13 '15 at 08:58
  • 3
    lubridate is an awesome package. am still using it in 2018 and can't get enough of it. There is a 'lubridate' cheat sheet at https://github.com/rstudio/cheatsheets/raw/master/lubridate.pdf – Lobbie Apr 28 '18 at 11:08
  • @hadley When I am King, you shall be knighted. – shekeine Jan 21 '21 at 12:54
61

You may be overcomplicating things, is there any reason you need the stringr package? You can use as.Date and its format argument to specify the input format of your string.

 df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
 as.Date(df$Date, format =  "%m/%d/%Y %H:%M:%S")
 # [1] "2009-10-09" "2009-10-15"

Note the Details section of ?as.Date:

Character strings are processed as far as necessary for the format specified: any trailing characters are ignored

Thus, this also works:

as.Date(df$Date, format =  "%m/%d/%Y)
# [1] "2009-10-09" "2009-10-15"

All the conversion specifications that can be used to specify the input format are found in the Details section in ?strptime. Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string.


More generally and if you need the time component as well, use as.POSIXct or strptime:

as.POSIXct(df$Date, "%m/%d/%Y %H:%M:%S")    
strptime(df$Date, "%m/%d/%Y %H:%M:%S")

I'm guessing at what your actual data might look at from the partial results you give.

Henrik
  • 56,228
  • 12
  • 124
  • 139
mdsumner
  • 26,859
  • 5
  • 76
  • 87
  • 1
    I would caution against `strptime` because it returns a `POSIXlt` object, which tends to give new users fits because they don't realize it's a list. If you need the time, use `as.POSIXct` but beware if your "dates" are really factors... – Joshua Ulrich Nov 30 '10 at 04:44
  • 1
    true, but since R 2.11.0 "length() now returns the length of the corresponding abstract timedate-vector rather than always 9 (the length of the underlying list structure). (Wish of PR#14073 and PR#10507.)" so I wondered if that was worth complicating things with. You can just as.POSIXct(strptime(x)) anyway. – mdsumner Nov 30 '10 at 06:19
  • I didn't realize that. Thanks for the pointer. Though I wonder if it could still be confusing if you have a `POSIXlt` column in a `data.frame`... – Joshua Ulrich Nov 30 '10 at 14:43
  • I realized after that it's not completely helpful - in a data.frame you will still get into trouble, though I think it's possible to put lists and arrays etc. in data.frames as columns. But I think better to understand the difference of lt/ct and use them carefully. – mdsumner Nov 30 '10 at 21:05
  • 1
    This seems misleading to me since the Date class that as.Date returns does not actually handle time. The answer implies that it does. – Mike Wise Mar 13 '15 at 08:50
1

library(lubridate) if your date format is like this '04/24/2017 05:35:00'then change it like below prods.all$Date2<-gsub("/","-",prods.all$Date2) then change the date format parse_date_time(prods.all$Date2, orders="mdy hms")

Nayab khan
  • 21
  • 1