Remove duplicate rows for multiple dataframes

Question

I have over 100 dataframes (df1, df2, df3, ....) each contains the same variables. I want to loop through all of them and remove duplicates by id. For df1, I can do:

df1 <- df1[!duplicated(df1$id), ]

How can I do this in an efficient way?

Put them in a list and then use `lapply`. [This post](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) will be helpful. My answer there gives an easy way to get a named list and gregor's excellent answer illustrates methods for working with lists of data.frames. — lmo, Feb 10 '17 at 16:24
If they are all data frames with object name starting with "df" you might want to create a list using `ls(pattern = "df[0-9]")` and then cycle over them. — Matteo Castagna, Feb 10 '17 at 16:26

r2evans · Accepted Answer · 2017-02-10T20:03:46.560

If you're dealing with 100 similarly-structured data.frames, I suggest instead of naming them uniquely, you put them in a list.

Assuming they are all named df and a number, then you can easily assign them to a list with something like:

df_varnames <- ls()[ grep("^df[0-9]+$", ls()) ]

or, as @MatteoCastagna suggested in a comment:

df_varnames <- ls(pattern = "^df[0-9]+$")

(which is both faster and cleaner). Then:

dflist <- sapply(df_varnames, get, simplify = FALSE)

And from here, your question is simply:

dflist2 <- lapply(dflist, function(z) z[!duplicated(z$id),])

If you must deal with them as individual data.frames (again, discouraged, almost always slows down processing while not adding any functionality), you can try a hack like this (using df_varnames from above):

for (dfname in df_varnames) {
  df <- get(dfname)
  assign(dfname, df[! duplicated(df$id), ])
}

I cringe when I consider using this, but I admit I may not understand your workflow.

Remove duplicate rows for multiple dataframes

1 Answers1