4

I would like to create a loop over 3 data frames and creates subsets of each and assign to these new subsets a new name. How can I loop over these three data frames while maintaining the names?

For example, I have 3 data frames: apples, berries, and grapes. When making a loop, is there a way to assign the new subset data frames similar names to their respective original data frame?

Written out without a loop, this is what the code would look like.

apples <- data.frame(type = c("red", "golden", "green"), number = c(1, 2, 3))
berries <- data.frame(type = c("blueberry", "raspberry", "mulberry"), number = c(1, 2, 3))
grapes <- data.frame(type = c("red", "green", "sour"), number = c(1, 2, 3))

apples_large <- subset(apples, number > 2)
apples_small <- subset(apples, number < 2)

berries_large <- subset(berries, number > 2)
berries_small <- subset(berries, number < 2)

grapes_large <- subset(grapes, number > 2)
grapes_small <- subset(grapes, number < 2) 
  • 2
    It is better not to create multiple objects in the global env – akrun Jul 30 '18 at 16:22
  • How did you create these data.frames in the first place? It seems like if the values are related and you want to perform similar actions on them, they should all be in the same list. Then it's much easier to work with. See: https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames and this related answer: https://stackoverflow.com/a/51560385/2372064 – MrFlick Jul 30 '18 at 16:22
  • Perhaps helpful: https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames#24376207 – r2evans Jul 30 '18 at 16:23
  • Look at the assign function. – JonMinton Jul 30 '18 at 16:24

3 Answers3

4

Place the dataset objects in a list and split by the 'number' column to get a nested list of datasets

lapply(list(apples, berries, grapes), function(x) split(x, x$number>2)) 

If we create a named list, then it becomes easier to identify or extract the individual components

out <- lapply(mget(c("apples", "berries", "grapes")),
  function(x) split(x, c("small", "large")[(x$number > 2) + 1]))
out$apples$small

As @JonMinton mentioned if we need to drop the rows that have 'number' 2

lapply(mget(c("apples", "berries", "grapes")),
       function(x) {x1 <- subset(x, number != 2)
             split(x1, c("small", "large")[(x1$number > 2) + 1])})   
akrun
  • 674,427
  • 24
  • 381
  • 486
  • 1
    Split only works if the splits are mutually exclusive and exhaustive. In the example the subsets were < 2 and > 2, so ==2 dropped. – JonMinton Jul 30 '18 at 16:36
3

It's a bad idea to create many objects in the global environment, rather than keeping them in a list, but this would do it:

tmp <- c("apples", "berries", "grapes")

for (i in 1:length(tmp)){
  assign(paste0("big_", tmp[i]), subset(get(tmp[i]), number > 2))
  assign(paste0("small_", tmp[i]), subset(get(tmp[i]), number < 2))
}

(or use seq_along(tmp) instead of 1:length(tmp))

Notice the use of assign for the outputs and get for the inputs.

JonMinton
  • 1,117
  • 1
  • 7
  • 25
1

First, put your data.frames into a list, then define a function that classifies the rows. Now you can split each element of the list according to your classifier in an lapply.

fruits <- list(
    apples=data.frame(type = c("red", "golden", "green"), number = c(1, 2, 3)),
    berries=data.frame(type = c("blueberry", "raspberry", "mulberry"), number = c(1, 2, 3)),
    grapes=data.frame(type = c("red", "green", "sour"), number = c(1, 2, 3))
)

clsfy <- function(num) {
    if (num>2) {
        ret <- "Large"
    } else if (num<2) {
        ret <- "Small"
    } else {
        ret <- NA ## if no condition is met, discard this row
    }
    return(ret)
}

fruits2 <- lapply(fruits, function(fr) {
    split(fr, sapply(fr$number, clsfy))
})

At this point, fruits2 looks like this:

>     fruits2
$apples
$apples$Large
   type number
3 green      3

$apples$Small
  type number
1  red      1


$berries
$berries$Large
      type number
3 mulberry      3

$berries$Small
       type number
1 blueberry      1


$grapes
$grapes$Large
  type number
3 sour      3

$grapes$Small
  type number
1  red      1

To generalize classifications using more than one column per row, you can use apply instead of sapply and re-define your clsfy function so that it takes the whole row: split(fr, apply(fr, 1, clsfy)). On the other hand, if your condition is really a simple binary, then ifelse is better than sapply(x$number, clsfy).

flies
  • 1,912
  • 2
  • 21
  • 35