11

I have dataset where time.start vary from 5:00:00 to 28:59:59 (i.e. 01.01.2013 28:00:00 is actually 02.01.2013 04:00:00). Dates are in %d.%m.%Y format.

      Date Time.start   
01.01.2013   22:13:07
01.01.2013   22:52:23
01.01.2013   23:34:06
01.01.2013   23:44:25
01.01.2013   27:18:48
01.01.2013   28:41:04

I want to convert it to normal date format.

dates$date <- paste(dates$Date,dates$Time.start, sep = " ")
dates$date <- as.POSIXct(strptime(dates$date, "%m.%d.%Y %H:%M:%S"))

But obviously I have NA for time > 23:59:59

How should I modify my code?

BiXiC
  • 817
  • 2
  • 8
  • 26
  • Reminds me of the time ranges for Japanese TV show scheduling, which get silly when they use non-24 hour time (e.g. "late night 3:00" to mean 3am the next day). – JAB Jun 19 '14 at 15:44
  • Indeed, you see bars advertising that they are open until "25:00" for 1AM etc. –  Jun 19 '14 at 15:51
  • Normalize by offsetting -5 hours in the database and add +5 when you actually use the data. :) `[With a -5, you be in 00:00:00 - 23:59:59 range.]` – CodeAngry Jun 19 '14 at 17:06
  • Why in the heck are they using 05:00:00 - 28:59:59? – Panzercrisis Jun 19 '14 at 18:24
  • @Panzercrisis: If a business were closed daily from 1:00am-7:00am, and had a work week that went from 00:00:00.01 Monday through 23:59:59.99 Sunday, a work shift that started at 5:00pm Sunday and ended at 1:00am on Monday would have seven hours attributable to one paycheck and one attributed to the next. Recording the shift as running from 17:00 through 25:00 on Sunday avoids that problem. – supercat Jun 19 '14 at 18:43
  • I guess I would just be more tempted to keep up with it. – Panzercrisis Jun 19 '14 at 18:54

3 Answers3

9

E.g. add the time as seconds to the date:

df <- read.table(header=T, text="      Date Time.start   
01.01.2013   22:13:07
01.01.2013   22:52:23
01.01.2013   23:34:06
01.01.2013   23:44:25
01.01.2013   27:18:48
01.01.2013   28:41:04", stringsAsFactors=FALSE)

as.POSIXct(df$Date, format="%d.%m.%Y") +
  sapply(strsplit(df$Time.start, ":"), function(t) {
    t <- as.integer(t)
    t[3] + t[2] * 60 + t[1] * 60 * 60
  })

# [1] "2013-01-01 22:13:07 CET" "2013-01-01 22:52:23 CET" "2013-01-01 23:34:06 CET"
# [4] "2013-01-01 23:44:25 CET" "2013-01-02 03:18:48 CET" "2013-01-02 04:41:04 CET"
lukeA
  • 48,497
  • 5
  • 73
  • 84
  • How about using `scan` to directly read in the ints from dates. No strsplit and no coercion. I really like this idea of yours. – asb Jun 19 '14 at 12:20
  • Your solution is faster than @akrun. I have data frame with 180 000 rows and it takes 2.5 secs to create your vector vs 31 secs using `with` command – BiXiC Jun 19 '14 at 13:25
  • Now I have problem with function for this script. =( http://stackoverflow.com/questions/24308718/050000-285959-time-format-function – BiXiC Jun 19 '14 at 14:12
8

Just a modification of lukeAs solution:

with(df, as.POSIXct(Date, format="%d.%m.%Y")+
 colSums(t(read.table(text=Time.start, sep=":",header=F))*c(3600,60,1)))
[1] "2013-01-01 22:13:07 EST" "2013-01-01 22:52:23 EST" 
[3] "2013-01-01 23:34:06 EST" "2013-01-01 23:44:25 EST"
[5] "2013-01-02 03:18:48 EST" "2013-01-02 04:41:04 EST"
akrun
  • 674,427
  • 24
  • 381
  • 486
2

Using lubridate:

with(dates, mdy(Date) + hms(Time.start))

Generates:

[1] "2013-01-01 22:13:07 UTC" "2013-01-01 22:52:23 UTC"
[3] "2013-01-01 23:34:06 UTC" "2013-01-01 23:44:25 UTC"
[5] "2013-01-02 03:18:48 UTC" "2013-01-02 04:41:04 UTC"
rrs
  • 8,457
  • 4
  • 24
  • 34