0

I am new to R and have been struggling a lot with this problem. Tried to find a solution across places but couldn't.
I have a folder containing multiple csv files (about 158). Each csv has a column with date and time. I found out that the format of the date is not standard across csv files, which messes up my analyzes. Example:

>head(file1) # date format is in MONTH/day/year
        DateTime         Value
    6/2/14 11:00:01 PM  24.111
    6/3/14 1:30:01 AM   21.61
    6/3/14 4:00:01 AM   20.609

>head(file2) # date format is in day/MONTH/year
DateTime    Value
03/06/14 1:30:01 AM 21.588
03/06/14 4:00:01 AM 20.086
03/06/14 6:30:01 AM 18.584

I made the following loop to bind the files.

>files.names<-list.files(getwd(),pattern="*.csv")
>theList <- lapply(files.names,function(x){
>  theData <- read.csv(x,skip=18) })
>theResult <- do.call(rbind,theList)
>head(theResult)
               Date.Time   Value
1 6/2/14 11:00:01 PM     24.111
2  6/3/14 1:30:01 AM     21.610
3  6/3/14 4:00:01 AM     20.609
4  6/3/14 6:30:01 AM     19.107
5  6/3/14 9:00:01 AM     19.608
6 6/3/14 11:30:01 AM     20.609

What I think: I am guessing that there must be a way to standardize the format of the Date.Time column in the loop of each csv before binding them. That is, I think I have to do that before I do.call(rbind,theList), but not sure how (or if it is possible). Formatting each csv file in Excel would be a pain in the ass, so I would appreciate some help :P .

Caesar Gan
  • 25
  • 5
  • 1
    why cant you just use lubridate or intellidate packages on "theResult$Date.Time" ? – Dan Dec 21 '17 at 23:38
  • Yes... before. Use as.POSIXct withformat="%m/%d/%y %I:%M:%S %p" in one case and "%d/%m/%y %I:%M:%S %p" in the other case. Read `?strptime`. Possible that the `anytime` package might be of value. – IRTFM Dec 22 '17 at 06:55
  • 1
    Thanks @Dan, I appreciate it. After digging the `lubridate` package i ended up using the function `parse_date_time` - got some help from this one too: (https://stackoverflow.com/questions/26064292/guess-formats-r-lubridate). I will post my solution. – Caesar Gan Dec 22 '17 at 08:03

2 Answers2

2

Could get it right by adding an extra line of code in after read.csv, using parse_date_time function in lubridate package, as recommended.

>files.names<-list.files(getwd(),pattern="*.csv")
>theList <- lapply(files.names,function(x){
>theData <- read.csv(x,skip=18) 
>theData$Date.Time<-parse_date_time(x = theData$Date.Time, 
           orders = c("mdy HMS %p", "dmy HMS %p"), local = eng") }) ###extra line
>theResult <- do.call(rbind,theList)

My guess is that R could figure out when to it should have been mdy or dmy based on the orders argument in parse_date_time.

Caesar Gan
  • 25
  • 5
1

Need to use proper format strings from the ?strptime help page:

file1$DateTime <- as.POSIXct( file1$DateTime , format="%m/%d/%y %I:%M:%S %p")
file2$DateTime <- as.POSIXct( file1$DateTime , format="%d/%m/%y %I:%M:%S %p")

The defatted format for date-time input is YYYY-MM-DD HH:MM:SS in 24 hour time. If you have an AM/PM indicator, then you need to use the %I 12 hour format spec.

> file1
             DateTime  Value
1 2014-06-02 23:00:01 24.111
2 2014-06-03 01:30:01 21.610
3 2014-06-03 04:00:01 20.609
> file2
             DateTime  Value
1 2014-06-02 23:00:01 21.588
2 2014-06-03 01:30:01 20.086
3 2014-06-03 04:00:01 18.584

Then use do.call(rbind, list(file1,file2)) to get:

             DateTime  Value
1 2014-06-02 23:00:01 24.111
2 2014-06-03 01:30:01 21.610
3 2014-06-03 04:00:01 20.609
4 2014-06-02 23:00:01 21.588
5 2014-06-03 01:30:01 20.086
6 2014-06-03 04:00:01 18.584
IRTFM
  • 240,863
  • 19
  • 328
  • 451
  • Thanks a lot @42- . Solutions is very helpful. Specially the part about 12h format. Problem is that there are several files and i dont which is in the mm/dd or dd/mm format. My solution worked – Caesar Gan Dec 22 '17 at 08:06