64

Suppose I have the following data.frame foo

           start.time duration
1 2012-02-06 15:47:00      1
2 2012-02-06 15:02:00      2
3 2012-02-22 10:08:00      3
4 2012-02-22 09:32:00      4
5 2012-03-21 13:47:00      5

And class(foo$start.time) returns

[1] "POSIXct" "POSIXt" 

I'd like to create a plot of foo$duration v. foo$start.time. In my scenario, I'm only interested in the time of day rather than the actual day of the year. How does one go about extracting the time of day as hours:seconds from POSIXct class of vector?

David LeBauer
  • 28,793
  • 27
  • 106
  • 180
andrewj
  • 2,705
  • 6
  • 31
  • 36
  • 2
    the libraries `lubridate` and `zoo` might be helpful for you. but in base R, `format(foo$start.time, format='%H:M')`. – Justin May 22 '12 at 15:48
  • Thanks. One issue with `format(foo$start.time, format='%H:M')` is that the output is in character format. I'd like the output to be in some kind of numeric format so that it can be used as the x axis of a plot. – andrewj May 22 '12 at 16:22
  • 1
    There are many ways. Again I'd point you to `lubridate` or [this post](http://stackoverflow.com/questions/7655514/how-do-i-plot-only-the-time-portion-of-a-timestamp-including-a-date) – Justin May 22 '12 at 17:03
  • Okay, using the `lubridate` package, I can do `x – andrewj May 22 '12 at 17:56
  • depends on how you're plotting, but the post I referenced should help. – Justin May 22 '12 at 18:20
  • @Justin, thanks for your suggestion with http://stackoverflow.com/questions/7655514/how-do-i-plot-only-the-time-portion-of-a-timestamp-including-a-date. The way I would approach this now would be to `foo$start.time – andrewj May 23 '12 at 03:20
  • Just a correction Justin: format(foo$start.time, format='%H:%M'). – Xavier Prudent Jul 06 '19 at 03:42

5 Answers5

57

This is a good question, and highlights some of the difficulty in dealing with dates in R. The lubridate package is very handy, so below I present two approaches, one using base (as suggested by @RJ-) and the other using lubridate.

Recreate the (first two rows of) the dataframe in the original post:

foo <- data.frame(start.time = c("2012-02-06 15:47:00", 
                                 "2012-02-06 15:02:00",
                                 "2012-02-22 10:08:00"),
                  duration   = c(1,2,3))

Convert to POSIXct and POSIXt class (two ways to do this)

# using base::strptime
t.str <- strptime(foo$start.time, "%Y-%m-%d %H:%M:%S")

# using lubridate::ymd_hms
library(lubridate)
t.lub <- ymd_hms(foo$start.time)

Now, extract time as decimal hours

# using base::format
h.str <- as.numeric(format(t.str, "%H")) +
               as.numeric(format(t.str, "%M"))/60

# using lubridate::hour and lubridate::minute
h.lub <- hour(t.lub) + minute(t.lub)/60

Demonstrate that these approaches are equal:

identical(h.str, h.lub)

Then choose one of above approaches to assign decimal hour to foo$hr:

foo$hr <- h.str

# If you prefer, the choice can be made at random:
foo$hr <- if(runif(1) > 0.5){ h.str } else { h.lub }

then plot using the ggplot2 package:

library(ggplot2)
qplot(foo$hr, foo$duration) + 
             scale_x_datetime(labels = "%S:00")
David LeBauer
  • 28,793
  • 27
  • 106
  • 180
  • Thanks for the suggestion. However, when the above is plotted, it treats each time point as a label or category rather than as a number. In other words, the points are equally spaced on the x axis. Contrast the difference with the following, taking the original `foo` and then plotting the following `foo$start.time.numeric – andrewj May 22 '12 at 19:54
  • 1
    In terms of the issue, you're describing, from this post here, http://stackoverflow.com/questions/7655514/how-do-i-plot-only-the-time-portion-of-a-timestamp-including-a-date, try `qplot(hour(foo$start.time) + minute(foo$start.time)/60, foo$duration) + scale_x_datetime(labels = date_format("%S:00"))`. It looks like changing `scale_x_datetime` has a `labels` parameter. – andrewj May 23 '12 at 03:16
  • you could cut out `lubridate` altogether by using `strptime` – RJ- May 23 '12 at 04:03
  • @RJ- I see how to replace the `lubridate::ymd_hms` function, but the only way I know to replace `lubridate::hour` and `lubridate::minute` is `as.numeric(format(foo$start.time), "%H")` and `as.numeric(format(foo$start.time), "%M")`. So I agree that it can be done (and there are reasons to reduce dependencies), but lubridate really does make it easier. I will post both options, but welcome your suggestions. – David LeBauer May 23 '12 at 14:59
  • @David when i use your code above answer scale_x_datetime(labels = date_format("%S:00")).i get an error saying 'Error in f(..., self = self) : Breaks and labels are different lengths' .Can you tell me a way to get around that error – alily Sep 23 '16 at 10:05
18

You could rely on base R:

# Using R 2.14.2
# The same toy data
foo <- data.frame(start.time = c("2012-02-06 15:47:00", 
                                 "2012-02-06 15:02:00",
                                 "2012-02-22 10:08:00"),
                  duration   = c(1,2,3))

Since class POSIXct contains date-time information in a structured manner, you can rely on substr to extract the characters in time positions within the POSIXct vector. That is, given you know the format of your POSIXct (how it would be presented when printed), you can extract hours and minutes:

# Extract hour and minute as a character vector, of the form "%H:%M"
substr(foo$start.time, 12, 16)

And then paste it to an arbitrary date to convert it back to POSIXct. In the example I use January first 2012, but if you don't specify a date and instead use format R uses the current date.

# Store time information as POSIXct, using an arbitrary date
foo$time <- as.POSIXct(paste("2012-01-01", substr(foo$start.time, 12, 16)))

And both plot and ggplot2 know how to format times in POSIXct out of the box.

# Plot it using base graphics
plot(duration~time, data=foo)

# Plot it using ggplot2 (0.9.2.1)
library(ggplot2)
qplot(x=time, y=duration, data=foo)
chemman
  • 181
  • 1
  • 3
10

Lubridate doesn't handle time of day data, so Hadley recommends the hms package for this type of data. Something like this would work:

library(lubridate)
foo <- data.frame(start.time = parse_datetime(c("2012-02-06 15:47:00", 
                                 "2012-02-06 15:02:00",
                                 "2012-02-22 10:08:00")),
                  duration   = c(1,2,3))


foo<-foo %>% mutate(time_of_day=hms::hms(second(start.time),minute(start.time),hour(start.time)))

Watch out for 2 potential issues - 1) lubridate has a different function called hms and 2) hms::hms takes the arguments in the opposite order to that suggested by its name (so that just seconds may be supplied)

andyyy
  • 846
  • 8
  • 8
8

This code is much faster than converting to string and back to numeric

time <- c("1979-11-13T08:37:19-0500", "2014-05-13T08:37:19-0400");
time.posix <- as.POSIXct(time, format = "%Y-%m-%dT%H:%M:%S%z");
time.epoch <- as.vector(unclass(time.posix));
time.poslt <- as.POSIXlt(time.posix, tz = "America/New_York");
time.hour.new.york <- time.poslt$hour + time.poslt$min/60 + time.poslt$sec/3600;

> time;
[1] "1979-11-13T08:37:19-0500" "2014-05-13T08:37:19-0400"
> time.posix;
[1] "1979-11-13 15:37:19 IST" "2014-05-13 15:37:19 IDT"
> time.poslt;
[1] "1979-11-13 08:37:19 EST" "2014-05-13 08:37:19 EDT"
> time.epoch;
[1]  311348239 1399984639
> time.hour.new.york;
[1] 8.621944 8.621944
Liran Katzir
  • 109
  • 1
  • 4
2

It is ancient topic, but I have found very few questions and answers about this matter. My solution is the following

library(hms)
foo <- data.frame(start.time = c("2012-02-06 15:47:00", 
                             "2012-02-06 15:02:00",
                             "2012-02-22 10:08:00"),
              duration   = c(1,2,3))

foo$start.time = as.POSIXct( foo$start.time )

g1 = ggplot( ) + xlab("") + 
  geom_line( data = foo, aes(x = as.hms(start.time), y = duration ), color = "steelblue" )
g1

If you would like to add manual time (!) breaks, then

time_breaks =    as.POSIXlt(c(
                   "2012-02-06 12:35:00 MSK", 
                   "2012-02-06 13:15:00 MSK",
                   "2012-02-06 14:22:00 MSK",
                   "2012-02-06 15:22:00 MSK"))
 g1 + 
  scale_x_time( breaks = as.hms( time_breaks ) ) +
  theme(  axis.text.x = element_text( angle=45, vjust=0.25) ) 
Stepan S. Sushko
  • 1,021
  • 8
  • 5