0

Currently, using a for loop, I spit out multiple data frames all with the name

res_[Insert some date] 

For example in my environment I have:

res_2018-04-01

res_2018-05-01

res_2018-06-01

Each of those data frames has 1 row and 25 columns. The first column is the date the file was uploaded (so for the first data frame the column would have a value of 2018-04-11) and the subsequent columns are the 24 months before the date the file was uploaded.

After the for loop runs, I want to merge all of the data frames that were created without manually typing in the names of those data frames. So for example, I could merge all data frames with a name like res_%

So my final data frame would have 27 columns and (in this case) 3 rows.

Any suggestions?

EDIT: Here is the output for dput(res_2018-04-11)

dput(`res_2018-04-11`)
structure(list(UploadDate = "2018-04-11", `2016-03-01` = 4822598.18735351, 
`2016-04-01` = 4022970.75519652, `2016-05-01` = 4471569.0873137, 
`2016-06-01` = 4762693.19167908, `2016-07-01` = 3799649.58966077, 
`2016-08-01` = 4667486.94228869, `2016-09-01` = 4748252.38636671, 
`2016-10-01` = 3953585.1499195, `2016-11-01` = 5258338.05699641, 
`2016-12-01` = 4529140.27998058, `2017-01-01` = 3334021.87730489, 
`2017-02-01` = 4842435.58785495, `2017-03-01` = 5430798.18389053, 
`2017-04-01` = 4045150.70691188, `2017-05-01` = 4660901.62683975, 
`2017-06-01` = 4684489.25953388, `2017-07-01` = 3680375.93521103, 
`2017-08-01` = 5231564.19023014, `2017-09-01` = 4073906.09821191, 
`2017-10-01` = 5440655.92109229, `2017-11-01` = 4996844.57817061, 
`2017-12-01` = 5087355.28846096, `2018-01-01` = 2895616.00464724, 
`2018-02-01` = 3766770.55063743), .Names = c("UploadDate", 
"2016-03-01", "2016-04-01", "2016-05-01", "2016-06-01", "2016-07-01", 
"2016-08-01", "2016-09-01", "2016-10-01", "2016-11-01", "2016-12-01", 
"2017-01-01", "2017-02-01", "2017-03-01", "2017-04-01", "2017-05-01", 
"2017-06-01", "2017-07-01", "2017-08-01", "2017-09-01", "2017-10-01", 
"2017-11-01", "2017-12-01", "2018-01-01", "2018-02-01"), row.names = c(NA, 
-1L), class = "data.frame")
  • It will be useful if you share the output from `dput(res_2018-04-01)`, `dput(res_2018-05-01)`, and `dput(res_2018-06-01)` in your question. – Weihuang Wong Jun 11 '18 at 15:22
  • 1
    First, put them in a list: `my_list = mget(ls("res_.*"))`, then [use your favorite of these answers](https://stackoverflow.com/q/8091303/903061). Note that you `for` loop to make these data frames would probably be simplified if it just put them in a `list` from the start. See [my answer here for details](https://stackoverflow.com/a/24376207/903061). – Gregor Thomas Jun 11 '18 at 15:32
  • Though it seems by "merge" you mean "stack" - `merge` is the base R function for doing joins, you probably just want `rbind`. `do.call(rbind, my_list)` (this is example is used in the second link of my previous comment). – Gregor Thomas Jun 11 '18 at 15:34
  • That makes sense. But when I run the `do.call(rbind, my_list)` it gives me 1 column with 75 rows. I need 27 columns with 3 rows – Cameron Kormylo Jun 11 '18 at 15:38
  • Once you have your list, you can use `plyr::rbind.fill`, as mentioned in this question: https://stackoverflow.com/questions/3402371/combine-two-data-frames-by-rows-rbind-when-they-have-different-sets-of-columns – Weihuang Wong Jun 11 '18 at 15:42
  • Also when running the code you gave to put the dfs into a list I get the error: `Error in as.environment(pos) : no item called "res_.*" on the search list` – Cameron Kormylo Jun 11 '18 at 15:43
  • @WeihuangWong perfect. `rbind.fill` is exactly what I needed. @Gregor the list code that you gave me still is not working. – Cameron Kormylo Jun 11 '18 at 15:46
  • The list code I posted assumed your columns had the same names - hence you get an error "names do not match previous names". Only after you added an example could we see that was not the case. – Gregor Thomas Jun 11 '18 at 17:23

1 Answers1

3

It is often useful to provide a minimal working example in your question. Here is an example of one. Here, instead of 24 columns of data, I use just 3 columns, but it shows the same point:

`res_2018-04-11` <- structure(list(UploadDate = "2018-04-11", 
`2017-12-01` = 5087355.28846096, `2018-01-01` = 2895616.00464724, 
`2018-02-01` = 3766770.55063743), .Names = c("UploadDate", "2017-12-01", 
"2018-01-01", "2018-02-01"), row.names = c(NA, -1L), class = "data.frame")

`res_2018-03-09` <- structure(list(UploadDate = "2018-03-09", 
`2017-11-01` = 4996844.57817061, `2017-12-01` = 5087355.28846096, 
`2018-01-01` = 2895616.00464724), .Names = c("UploadDate", "2017-11-01", 
"2017-12-01", "2018-01-01"), row.names = c(NA, -1L), class = "data.frame")

`res_2018-02-12` <- structure(list(UploadDate = "2018-02-12", 
`2017-10-01` = 5440655.92109229, `2017-11-01` = 4996844.57817061, 
`2017-12-01` = 5087355.28846096), .Names = c("UploadDate", "2017-10-01", 
"2017-11-01", "2017-12-01"), row.names = c(NA, -1L), class = "data.frame")

Then, as @Gregor mentioned in the comments, create a list of dataframes, specifying the pattern argument:

df_list <- mget(ls(pattern = "^res_.*"))

Now use rbind.fill from the plyr package:

plyr::rbind.fill(df_list)
#   UploadDate 2017-10-01 2017-11-01 2017-12-01 2018-01-01 2018-02-01
# 1 2018-02-12    5440656    4996845    5087355         NA         NA
# 2 2018-03-09         NA    4996845    5087355    2895616         NA
# 3 2018-04-11         NA         NA    5087355    2895616    3766771

Alternatively, with the dplyr package:

dplyr::bind_rows(df_list)
#   UploadDate 2017-10-01 2017-11-01 2017-12-01 2018-01-01 2018-02-01
# 1 2018-02-12    5440656    4996845    5087355         NA         NA
# 2 2018-03-09         NA    4996845    5087355    2895616         NA
# 3 2018-04-11         NA         NA    5087355    2895616    3766771
Larry Cai
  • 674
  • 7
  • 21
Weihuang Wong
  • 11,980
  • 2
  • 22
  • 45