1

I have a function to perform actions on a variable list of dataframes depending on user selections. The function mostly performs generic actions but there are a few actions that are dataframe specific.

My code runs fine if all dataframes are selected but I am unable to get it to work if not all dataframes are selected.

The following provides a minimal reproducible example:

# User switches.
df1Switch <- TRUE
df2Switch <- TRUE
df3Switch <- TRUE

# DF creation.
set.seed(1)
df <- data.frame(X=sample(1:10), Y=sample(11:20))
if (df1Switch) df1 <- df
if (df2Switch) df2 <- df
if (df3Switch) df3 <- df

# Function to do something.
fn_something <- function(file_list, file_names) {
  df <- file_list
  # Do lots of generic things.
  df$Z <- df$X + df$Y
  # Do a few specific things.
  if (file_names == "Name1") df$X <- df$X + 1 
  else if (file_names == "Name2") df$X <- df$Z - 1
  else if (file_names == "Name3") df$Y <- df$X + df$Y 
  return(df)
}

# Call function to do something.
file_list <- list(Name1=df1, Name2=df2, Name3=df3)
file_names <- names(file_list)
all_df <- do.call(rbind,mapply(fn_something, file_list, file_names, 
SIMPLIFY=FALSE))

In this case the code runs fine as the user has selected to create all three dataframes. I use a named list so that the specific actions can be performed against the correct dataframes.

The output looks something like this (the actual numbers aren't important):

           X     Y    Z
Name1.1    4    13   16
Name1.2    5    12   16 
Name1.3    6    16   21 
   :       :     :    :
Name2.1   15    13   16
   :       :     :    : 

The problem arises if the user selects not to create some dataframes, e.g.:

# User switches.
df1Switch <- TRUE
df2Switch <- FALSE
df3Switch <- TRUE

Not surprisingly, in this case an object not found error results:

> # Call function to do something.
> file_list <- list(Name1=df1, Name2=df2, Name3=df3)
Error: object 'df2' not found

What I would like to do is conditionally specify the contents of file_list along the lines of this pseudo code:

file_list <- list(if (df1Switch) {Name1=df1}, if (df2Switch) {Name2=df2}, if (df3Switch) {Name3=df3})

I have come across list.foldLeft Conditionally merge list elements but I don't know if this is suitable.

Andrew Eaves
  • 158
  • 11
  • Is there a particular reason you need to keep the frames in separate variables? I often prefer a [list-of-frames](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for things like this. Regardless, does `c(if (df1Switch) list(Name1=df1), if (df2Switch) list(Name2=df2), if (df3Switch) list(Name3=df3))` work for you? – r2evans May 22 '19 at 21:35
  • 1
    Thanks #r2evans. No particular reason why I have frames in separate variables - it's more the case that if I can get something to work then I'll stick with it. Your straight-forward solution works for my setup so if you make it an answer then I'll accept it. – Andrew Eaves May 22 '19 at 22:47
  • It's obvious you spent enough time on the question to ensure it is reproducible, complex-enough to see what might be going on, but simple-enough to follow without bifocals and a beer. Nice effort. – r2evans May 22 '19 at 22:57
  • 1
    Seems I've spent enough time looking at Q&A's to know what is friendly to the reader. Thanks for your detailed response, it's always good to learn new ways of doing things. – Andrew Eaves May 22 '19 at 23:04

1 Answers1

1

(I'll re-hash my comment:)

In general, I would encourage you to consider use of a list-of-dataframes instead of individual frames. My rationale for this:

  • assuming that each frame is structured (nearly) identically; and
  • assuming that what you do to one frame you will (or at least can) do to all frames; then
  • it is easier to list_of_frames <- lapply(list_of_frames, some_func) than it is to do something like:

    for (nm in c("df1", "df2", "df3")) {
      d <- get(nm)
      d <- some_func(d)
      assign(nm, d)
    }
    

    especially when dealing with non-global environments (i.e., doing this within a function).

To be clear, "easier" is subjective: though it does win code-golf, I find it much easier to read and understand that "I am running some_func on each element of list_of_frames and saving the result". (You can even save it to a new list-of-frames, thereby keeping the original frames untouched.)

You may also do things conditionally, as in

needs_work <- sapply(list_of_frames, some_checker_func) # returns logical
# or
needs_work <- c("df1", "df2") # names of elements of list_of_frames
list_of_frames[needs_work] <- lapply(list_of_frames[needs_work], some_func)

Having said that ... the direct answer to your one liner:

c(if (df1Switch) list(Name1=df1), if (df2Switch) list(Name2=df2), if (df3Switch) list(Name3=df3))

This capitalizes on the fact that unstated else results in a NULL, and the NULL-compressing (dropping) characteristic of c(). You can see it in action with:

c(if (T) list(a=1), if (T) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $b
# [1] 2
# $d
# [1] 4

c(if (T) list(a=1), if (FALSE) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $d
# [1] 4
r2evans
  • 77,184
  • 4
  • 55
  • 96