I have a dataframe, df
, that looks like the following (in reality, I have a lot more columns, leading up all the way to 2020_Actual and 2020_Budget):
Company Code 2013_Actual 2013_Budget 2014_Actual 2014_Budget
CompanyX 100 - Salary & Wages 1500 1601 1620 1680
CompanyX 102 - Bonus & Incentives 3000 3500 3150 3300
CompanyX 104 - Overtime 60 70 78 82
CompanyX 110 - Temporary Help 35 55 48 56
CompanyX 112 - Taxes & Benefits 800 840 880 900
I want to collapse all the columns after Code
into Actual
, Budget
, and Year
columns like this:
Company Code Actual Budget Year
CompanyX 100 - Salary & Wages 1500 1601 2013
CompanyX 102 - Bonus & Incentives 3000 3500 2013
CompanyX 104 - Overtime 60 70 2013
CompanyX 110 - Temporary Help 35 55 2013
CompanyX 112 - Taxes & Benefits 800 840 2013
CompanyX 100 - Salary & Wages 1620 1680 2014
CompanyX 102 - Bonus & Incentives 3150 3300 2014
CompanyX 104 - Overtime 78 82 2014
CompanyX 110 - Temporary Help 48 56 2014
CompanyX 112 - Taxes & Benefits 880 900 2014
This has been my approach so far, but it doesn't get me the correct outcome. It would also be nice to not have to specify each of the columns I want to gather by name, since my input table's column names may change:
library(dplyr)
library(tidyr)
# Read in data
df <- read.csv("inputTable.csv", header=T, stringsAsFactors=F)
# Collapse columns
actuals<- gather(df,
key = "Year",
value = "Actuals",
2013_Actual,2014_Actual,2015_Actual,2016_Actual,2017_Actual,2018_Actual,2019_Actual,2020_Actual)
budget <- gather(df,
key = "Year",
value = "Budget",
2013_Budget,2014_Budget,2015_Budget,2016_Budget,2017_Budget,2018_Budget,2019_Budget,2020_Budget)
# grab only relevant columns
df_actuals_final <- select(df_actuals, Company, Code, Year, Actuals)
df_budget_final <- select(df_budget, Company, Code, Year, Budget)
# merge
m <- merge(df_actuals_final, df_budget_final, by=c("Company","Code", "Year"))
write.csv(m, "inputTable.mod.csv", quote=F, row.names=F)