1

I'm preparing data frames for analysis in R. I can prepare them separately correctly but I want to place the preparation in a for loop (or apply/lapply?) for obvious reasons.

The initial code is like this(which works per data frame), where indHab is a data frame:

indHabO<-indHab[complete.cases(indHab),]
row.names(indHabO) <- indHabO$Location
indHabO[1] <- NULL
indHabOK = indHabO[,colSums(indHabO) > 0.1]

I tried a for loop but I got stuck. Only thing I know is that it was sensible to place all data frames in a list, like this before attempting a loop of some sort:

dataSets <- list(indHab, indLoc, famHab, famLoc, indicatorHab_2012, 
indicatorHab_2018, indicatorLoc_2012, indicatorLoc_2018)

How do I loop the operations over all data frames in the list?

  • 2
    I often refer to https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207 for lists-of-frames. It talks a bit about creating this list, not just processing it, but is still appropriate I think. Bottom line, something like `dataSets – r2evans May 03 '19 at 09:26
  • 1
    You refer to fields by-name (`$Location`) as well as by number (`indHab0[1]`), which is inconsistent. I'd suggest going by-name always if you are confident in the layout. Referencing by column-index can get you in trouble if you can imagine a situation where the columns might not be exactly as you expect. (I always assume the data-provider and/or the user are evil, trying hard to mess up my code, and then program defensively against this assumption.) – r2evans May 03 '19 at 09:28
  • @r2evans your lapply suggestion works, thanks! The manipulations work over all data frames in the list and I could get them into a new list of data frames called "dataSetsOK". Now I just need to get them out, I'll try and figure that out myself or would you say there is an easy way to get the manipulated data frames out with placing them in a new list? –  May 03 '19 at 10:14
  • 1
    @r2evans the way the data is set up, the first column will always have to become the row names. So in this particular case I guess an index is the better choice. –  May 03 '19 at 11:09

1 Answers1

0

Thanks to r2evans, this is what I used.

It puts all the manipulated data frames in a new list.

dataSetsOK <- lapply(dataSets, function(x) { x <- x[complete.cases(x),]; row.names(x) 
<- x$Location; x[1] <- NULL; x <- x[,colSums(x) > 0.1] })
  • 2
    Note: you can break lines and avoid the semicolon. @r2evans wrote a one-liner to fit the comments. – Parfait May 03 '19 at 12:37
  • @Parfait, I see. Always good to know these finer details. Thanks! –  May 03 '19 at 18:00