0

I am working on a project that involves using R to clean up date entries from an excel file in order to upload them to a database. Within the excel file, some of the dates are already in a date format, (i.e. "8/18/2020"), while others are not (i.e. "8/18/2020 2027"). I want to import the column of dates into R so I can run code to reformat the "8/18/2020 2027"-like cells. I am currently using the read_excel() command from the readxl library to import the excel files, using similar syntax to this:

read_excel("filepath", col_types = c("date"))

The issue that I am running into is that if I define the column as a date column, readexcel() will not recognize the "8/18/2020 2027"-like cells as valid dates and won't import those cells.

On the other hand, if I define the column as a text column, readexcel() will instead simplify the "8/18/2020"-like cells into a different number (44063).

Is there a way to import the entire row into R with its entries untouched?

  • Can you save your Excel file in an flat file (e.g. .txt, .csv) ? I think is less likely that these transformations would happen, with read.csv or read_csv functions. – Pedro Faria Aug 31 '20 at 21:04
  • 1
    Starting with @stefan's answer, consider an approach that tries various formats until all values are either converted successfully or nothing works ... https://stackoverflow.com/a/60748268/3358272 – r2evans Aug 31 '20 at 22:24

1 Answers1

0

Using the answer to this question you can read the column as text and convert the numbers representing the date to a character like so:

# Read xl file. Read as text
# foo <- readxl::read_excel("test.xlsx", col_names = "date")

# dput of test.xlsx
foo <- structure(list(date = c("44061", "8/18/2020 2027")), row.names = c(NA, 
                                                                          -2L), class = c("tbl_df", "tbl", "data.frame"))

foo
#>             date
#> 1          44061
#> 2 8/18/2020 2027

foo$date <- ifelse(grepl("^\\d+$", foo$date), format(as.Date(as.numeric(foo$date), origin = "1899-12-30"), "%m/%d/%Y"), foo$date)
#> Warning in as.Date(as.numeric(foo$date), origin = "1899-12-30"): NAs introduced
#> by coercion

foo
#>             date
#> 1     08/18/2020
#> 2 8/18/2020 2027
stefan
  • 20,714
  • 3
  • 13
  • 26