2

I'm trying to apply a very complex function to a list of more than 50 Data Frames. Let's use a very simple function to lowercase names and just 3 data frames for the sake of clarity, but my general approach is coded below

[EDITED NAMES]
# Data Sample. Every column name is different accross Data Frames


quality <- data.frame(FIRST=c(1,5,3,3,2), SECOND=c(3,6,1,5,5))
thickness <- data.frame(THIRD=c(6,0,9,1,2), FOURTH=c(2,7,2,2,1))
distance <- data.frame(ONEMORE=c(0,0,1,5,1), ANOTHER=c(4,1,9,2,3))


# list of dataframes

dfs <- list(quality, thickness, distance)


# a very simple function (just for testing)
# actually a very complex one is used on real data

BetterNames <- function(x) {
    names(x) <- tolower(names(x))
  x
}


# apply function to data frame list

dfs <- lapply(dfs, BetterNames)

# I know the expected R behaviour is to modify a copy of the object,
# instead of the original object itself. So if you get the names
# you get the original version, not the needed one

names(quality)

[1] "FIRST"  "SECOND"

is there any way of using any function inside a loop or "apply" in place for a huge amount of data frames?
As a result we must get the modified one replacing the original one for every data frame in the list (big list)

I know there's a trick using Data Table, but I wonder if using base R is that possible.

Expected Results:

 names(quality)

    [1] "first"  "second"

[EDITED] Pointed out to this answer: Rename columns in multiple dataframes, R

But not working. You can't use a vector of string names in my case because my new names are not a fixed list of strings.[EDITED DATA]

for(df in dfs) {
  df.tmp <- get(df)
  names(df.tmp) <- BetterNames(df)
  assign(df, df.tmp)
}

> names(quality)
[1] "quality" NA  

Thanks

Community
  • 1
  • 1
Forge
  • 1,238
  • 1
  • 9
  • 25
  • 2
    Did you check names(dfs[[1]]), because you are reffering to quality, which is dataframe with names before BetterNames. – M. Siwik Nov 02 '16 at 21:52
  • M. Siwik , the issue is to get all data frame names modified in place, if possible. Imagine if we must use names(dataframe) – Forge Nov 02 '16 at 21:59
  • you mean to get all dataframes in single list without typing listOfDf – M. Siwik Nov 02 '16 at 22:00
  • 1
    Hi! I think the answer you are looking for is here: http://stackoverflow.com/questions/18375969/rename-columns-in-multiple-dataframes-r (see the second answer!) – User2321 Nov 02 '16 at 22:01
  • 1
    http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames Check this anwers. Because you have to hold your df in list to make only one command, and have nice vectorised speed of computing. – M. Siwik Nov 02 '16 at 22:05
  • 2
    The simple answer is to not work on 50 separate objects in the global environment. You have a `list` already and you should work with it instead. If you need a named list to make things easier, try - `mget(c("quality","thickness","distance"))` to create it. – thelatemail Nov 02 '16 at 22:08
  • User2321 if you use a vector of string names you are unable to assign new names using a function. Take into account that the answer you provide uses a fixed list of column names (A,B,C) – Forge Nov 02 '16 at 22:27
  • Holding your data frames in a list **doesn't** add any "nice vectorised speed of computing"... but it **does** make your code cleaner to read and write. Having your data frames in a list is a Good Thing and you should be using the list of data frames rather than the original floating around in your global environment. As I say [in this answer](http://stackoverflow.com/a/24376207/903061) you can usually avoid having the data frames in your global environment at all. – Gregor Thomas Nov 02 '16 at 22:51
  • My bad. Edited Dat for proper sample. Every column name is different accross Data Frames – Forge Nov 02 '16 at 23:21

3 Answers3

2

You already have the best case scenario:

Let's add some names to your list:

names(dfs) <- c("quality", "thickness", "distance")
dfs <- lapply(dfs, BetterNames)

dfs[["quality"]]
#   first second
# 1     1      3
# 2     5      6
# 3     3      1
# 4     3      5
# 5     2      5

This works great. And all your data is in a list, so if there are other things you want to do to all your data frames it is very easy.

If you are done treating these data frames similarly and really want them back in the global environment to work with individually, you can do it with

list2env(dfs, envir = .GlobalEnv)

I would recommend keeping them in a list though---in most cases if you have 50 data frames you are working with, in a list it is easy to use lapply or for loops to use them, but as individual objects you will be copy/pasting code and making mistakes.


I would consider even starting with 50 data frames in your workspace a problem - see How do I make a list of data frames? for recommendations on finding an upstream fix: going straight to a list from the start.

Community
  • 1
  • 1
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
2

i'd use a simple yet effective parse & eval approach.

Let's use a for loop to compose a command that suited your needs:

for(df in dfs) {

command <- paste0("names(",df,") <- BetterNames(",df,")")
# print(command)
eval(parse(text=command))

}

names(quality)
[1] "first"  "second"

names(thickness)
[1] "third"  "fourth"

names(distance)
[1] "onemore"  "another"
useRj
  • 1,007
  • 1
  • 7
  • 12
0

This is for sure not optimal and I hope something better comes up but here it goes:

BetterNames <- function(x, y) {

    names(x) <- tolower(names(x))
    assign(y, x, envir = .GlobalEnv)

}

dfs <- list(quality, thickness, distance)
dfs2 <- c("quality", "thickness", "distance")
mapply(BetterNames, dfs, dfs2)

> names(quality)
[1] "first"  "second"
User2321
  • 2,156
  • 15
  • 31
  • 1
    I think a `list2env` on the result is bad, but still better than making the function do global assignment. Using OP's `BetterNames`, `dfs = lapply(dfs, BetterNames); names(dfs) = c("quality", "thickness", "distance"); list2env(dfs, envir = .GlobalEnv)` – Gregor Thomas Nov 02 '16 at 23:34
  • Yes you are right, I really did not manage to find a "nice" solution for this problem. (Btw I did not know the list2env function thank you for that!) – User2321 Nov 02 '16 at 23:38
  • Your answer is much nicer! Please either create a new answer or add it so the OP can see it ! – User2321 Nov 02 '16 at 23:43