0

I'm trying to read in a time series dataset spread over multiple years (so the sheet names are the respective years).

I want to read in each sheet and then mutate a new column called "year" that's equal to the sheet name. I'm not sure how to do this all in one fell swoop.

All I have right now is this:

map(excel_sheets(path), read_excel, path = path, skip = 1)
Simon
  • 15
  • 4

1 Answers1

0

Here is one of the possible solution.

Let say you have Excel file "ts.xlsx" with 3 sheets ("2016", "2017", "2018").

Each sheet has 3 values in "A" column:

"2016" - (1, 2, 3);

"2017" - (4, 5, 6);

"2018" - (7, 8, 9).

To read these data into one table with two columns ("data", "year") you can use the following R code:

# 1. Library
library(xlsx)

# 2. Excel file
excel_file <- "ts.xlsx"

# 3. Read Excel file
df <- loadWorkbook(excel_file)

# 4. Names and number of sheets
sheets_names <- names(getSheets(df))
sheets_count <- df$getNumberOfSheets()

# 5. Read Excel file by sheets
for(i in 1:sheets_count) {

  # 5.1. Read one sheet
  df_sheet_year <- read.xlsx(excel_file, i, header = F)
  df_sheet_year$name <- sheets_names[i]

  # 5.2. Merge with result dataset
  if(i == 1) {
    df_sheet <- df_sheet_year
  } else {
    df_sheet <- rbind(df_sheet, df_sheet_year)
  }
}

# 6. Rename features
colnames(df_sheet) <- c("data", "year")

# 7. Check result dataset
df_sheet
Andrii
  • 2,116
  • 19
  • 24