0

I have a dataframe that looks like this (there are hundreds of more rows)

      hour   magnitude tornadoCount hourlyTornadoCount Percentage Tornadoes
 1: 01 AM         0            5                 18          0.277777778
 2: 01 AM         1            9                 18          0.500000000
 3: 01 AM         2            2                 18          0.111111111
 4: 01 AM         3            2                 18          0.111111111
 5: 01 PM         0           76                150          0.506666667
 6: 01 PM         1           45                150          0.300000000
 7: 01 PM         2           21                150          0.140000000
 8: 01 PM         3            5                150          0.033333333
 9: 01 PM         4            3                150          0.020000000
10: 02 AM         0            4                 22          0.181818182
11: 02 AM         1            6                 22          0.272727273
12: 02 AM         2           11                 22          0.500000000
13: 02 AM         4            1                 22          0.045454545
14: 02 PM         0           98                173          0.566473988
15: 02 PM         1           36                173          0.208092486
16: 02 PM         2           25                173          0.144508671
17: 02 PM         3           11                173          0.063583815
18: 02 PM         4            2                173          0.011560694
19: 02 PM         5            1                173          0.005780347
20: 03 AM         1            6                  9          0.666666667
21: 03 AM         2            2                  9          0.222222222
22: 03 AM         3            1                  9          0.111111111
23: 03 PM         0          116                257          0.451361868
24: 03 PM         1           84                257          0.326848249
25: 03 PM         2           39                257          0.151750973
26: 03 PM         3           12                257          0.046692607
27: 03 PM         4            6                257          0.023346304
28: 04 AM         0            4                 16          0.250000000
29: 04 AM         1            5                 16          0.312500000
30: 04 AM         2            5                 16          0.312500000

I want to reorganize this such that the data is arrange chronologically according to the "hour" column. Is there a way to do this? Thanks!

  • 1
    Possible duplicate of [How to sort a dataframe by multiple column(s)?](https://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-multiple-columns) – Kara Woo May 01 '18 at 16:24
  • Some similarities but not really. I am sorting by 1 column but I don't know how to do it in this case. If it was labeled 0-23, it would be easier as I could just use asc or desc method. However, I don't know how to do that with values like 01AM. How can one give a command that they need to be organized in from 01AM-12AM? – I.J. Abdul Hakeem May 01 '18 at 17:04
  • Not sure why the downvote, +1 just to undo. – Brian Stamper May 01 '18 at 17:08
  • 1
    You need to encode your data in a way that R understands that `hour` refers to time. Perhaps you want to check out the package `lubridate` https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html – Brian Stamper May 01 '18 at 17:14
  • You could split out the numeric part of `hour` into a numeric column, then arrange by that – camille May 01 '18 at 17:23

1 Answers1

0

You can transform to a 24-hour based time using lubridate parser (%I is decimal hour (1-12) and %p is AM/PM indicator) an then sort based on that so using dpylr and lubridate:

library(dplyr)
library(lubridate)
ordered_df <- df %>% 
  mutate(hour_24 = parse_date_time(hour, '%I %p')) %>% 
  arrange(hour_24)
GordonShumway
  • 1,741
  • 7
  • 15