0

I have a dataframe, df, that looks like the following (in reality, I have a lot more columns, leading up all the way to 2020_Actual and 2020_Budget):

Company     Code                        2013_Actual 2013_Budget 2014_Actual 2014_Budget
CompanyX    100 - Salary & Wages        1500        1601        1620        1680
CompanyX    102 - Bonus & Incentives    3000        3500        3150        3300        
CompanyX    104 - Overtime              60          70          78          82
CompanyX    110 - Temporary Help        35          55          48          56  
CompanyX    112 - Taxes & Benefits      800         840         880         900

I want to collapse all the columns after Code into Actual, Budget, and Year columns like this:

Company     Code                        Actual  Budget  Year
CompanyX    100 - Salary & Wages        1500    1601    2013
CompanyX    102 - Bonus & Incentives    3000    3500    2013
CompanyX    104 - Overtime              60      70      2013
CompanyX    110 - Temporary Help        35      55      2013
CompanyX    112 - Taxes & Benefits      800     840     2013
CompanyX    100 - Salary & Wages        1620    1680    2014
CompanyX    102 - Bonus & Incentives    3150    3300    2014
CompanyX    104 - Overtime              78      82      2014
CompanyX    110 - Temporary Help        48      56      2014
CompanyX    112 - Taxes & Benefits      880     900     2014

This has been my approach so far, but it doesn't get me the correct outcome. It would also be nice to not have to specify each of the columns I want to gather by name, since my input table's column names may change:

library(dplyr)
library(tidyr)

# Read in data
df <- read.csv("inputTable.csv", header=T, stringsAsFactors=F)

# Collapse columns
actuals<- gather(df, 
                       key = "Year",
                       value = "Actuals",
                       2013_Actual,2014_Actual,2015_Actual,2016_Actual,2017_Actual,2018_Actual,2019_Actual,2020_Actual)

budget <- gather(df, 
                       key = "Year",
                       value = "Budget",
                       2013_Budget,2014_Budget,2015_Budget,2016_Budget,2017_Budget,2018_Budget,2019_Budget,2020_Budget)

# grab only relevant columns
df_actuals_final <- select(df_actuals, Company, Code, Year, Actuals)
df_budget_final <- select(df_budget, Company, Code, Year, Budget)

# merge 
m <- merge(df_actuals_final, df_budget_final, by=c("Company","Code", "Year"))

write.csv(m, "inputTable.mod.csv", quote=F, row.names=F)
claudiadast
  • 289
  • 5
  • 16

1 Answers1

0

gather has been retired and replaced with pivot_longer which makes such transformations very simple now.

tidyr::pivot_longer(df, cols = -c(Company, Code), 
                        names_to = c('Year', '.value'), 
                        names_sep = '_')


# A tibble: 10 x 5
#   Company  Code                 Year  Actual Budget
#   <chr>    <chr>                <chr>  <int>  <int>
# 1 CompanyX 100-Salary&Wages     2013    1500   1601
# 2 CompanyX 100-Salary&Wages     2014    1620   1680
# 3 CompanyX 102-Bonus&Incentives 2013    3000   3500
# 4 CompanyX 102-Bonus&Incentives 2014    3150   3300
# 5 CompanyX 104-Overtime         2013      60     70
# 6 CompanyX 104-Overtime         2014      78     82
# 7 CompanyX 110-TemporaryHelp    2013      35     55
# 8 CompanyX 110-TemporaryHelp    2014      48     56
# 9 CompanyX 112-Taxes&Benefits   2013     800    840
#10 CompanyX 112-Taxes&Benefits   2014     880    900

data

df <- structure(list(Company = c("CompanyX", "CompanyX", "CompanyX", 
"CompanyX", "CompanyX"), Code = c("100-Salary&Wages", "102-Bonus&Incentives", 
"104-Overtime", "110-TemporaryHelp", "112-Taxes&Benefits"), 
`2013_Actual` = c(1500L, 3000L, 60L, 35L, 800L), `2013_Budget` = c(1601L, 
3500L, 70L, 55L, 840L), `2014_Actual` = c(1620L, 3150L, 78L, 48L, 880L), 
`2014_Budget` = c(1680L, 3300L, 82L, 56L, 900L)), 
class = "data.frame", row.names = c(NA, -5L))
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • Thank you for the feedback! When I apply the above solution, it stacks the similar codes together and then alternates between years. I'm trying to get it so that it stacks all the codes and rows from 2013, followed by 2014, etc etc. (as seen in my expected output). How do I achieve that? – claudiadast Jun 12 '20 at 16:11
  • @claudiadast Isn't that just adding `%>% arrange(Year)` to the above output? – Ronak Shah Jun 13 '20 at 00:36