R Apply Function for Formatting Many Datasets

Question

I would like to loop through nine data sets, perform calculations, and output a different file name.

Existing Code:

    list <- c(corporate_service, finance, its, law, market_services, operations, president, member_services, System_Planning)

    Calc <- function(list){
  
         list %>%  filter(Total_Flag == 1) %>%
                   select(Element, Amount, Total)

     }
  
     lapply(list, Calc)

I would like to loop through each dataset and apply the function above. More specifically, I would like to re-name each processed dataframe something different. Is there a way to do this? I should also note, this code has not worked for me - is there anything noticeably wrong?

Thanks

Are the `corporate_service`, `finance`, etc variables data.frames? If so you should just `list()` rather than `c()` to put them in a collection. Are `wage_allocation` and `Calc` supposed to be the same function? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Dec 22 '20 at 20:44
corporate_service and finance are the names of the data.frame. Whoops, yes those are the same function. I edited it. — ldan, Dec 22 '20 at 20:53
Note, it's good practice not to use function names, e.g. `list` as variable names because its confusing. — SteveM, Dec 22 '20 at 20:55
Note, it's good practice not to save *separate* similarly structured data frames but continue to use a `list` of such elements. See this [canonical answer](https://stackoverflow.com/a/24376207/1422451). — Parfait, Dec 22 '20 at 21:14

score 0 · Answer 1 · answered Dec 22 '20 at 22:01

Avoid flooding your global environment with separate, similarly structured data frames in the first place. Instead continue to use a list of data frames. See @GregorThomas's best practices answer of why. In fact, a named list is preferable for better indexing.

# DEFINE A NAMED LIST OF DATA FRAMES
df_list <- list(corporate_service = corporate_service, 
                finance = finance, 
                its = its, 
                law = law, 
                market_services = market_services, 
                operations = operations, 
                president = president, 
                member_services = member_services, 
                system_planning = System_Planning)

# REMOVE ORIGINALS FROM GLOBAL ENVIRONMENT
rm(corporate_service, finance, its, law, market_services, 
   operations, president, member_services, System_Planning)

# REVIEW STRUCTURE
str(df_list)

Then define a method to interact with a single data frame (not list) and its list name. Then call it iteratively:

Calc <- function(df, nm) {
           df <- select(filter(df, Total_Flag == 1), Element, Amount, Total)       

           write.csv(df, file.path("path", "to", "my", "destination", paste(nm, ".csv")))
           return(df)           
        }
 
# ASSIGN TO A NEW LIST
new_df_list <- mapply(Calc, df_list, names(df_list), SIMPLIFY=FALSE)
new_df_list <- Map(Calc, df_list, names(df_list))    # EQUIVALENT WRAPPER TO ABOVE

To be clear, you lose no functionality of a data frame if it is stored in a larger container.

head(new_df_list$corporate_service)
tail(new_df_list$finance)
summary(new_df_list$its)

Such containers even help serialize same operations:

lapply(new_df_list, summary)

Even concatenate all data frame elements together with column of corresponding list name:

final_df <- dplyr::bind_rows(new_df_list, .id="division")

Overall, your organization and data management is enhanced since you only have to use a single, indexed object and not many that require ls, mget, get, eval, assign for dynamic operations.

R Apply Function for Formatting Many Datasets

1 Answers1