I have the output from a data submission which is in the form of multiple vector list objects in rda files. Each list object is in a separate rda file and i have nearly 2000 files. I want to merge all the objects into a single object in a single rda file in the fastest way (partly because i may need to repeat this several times). All the rda files are fairly small (~10mb though this will be a compressed size), but it all adds up with the number of files.
Memory isn't a huge problem as am running it on a server with >700GB RAM, My first approach to incrementally load them one by one concatenate with the merged list object and remove the object that was appended went badly due to the time it was going to take (something like 40 days at a best guess). My revised approach is below, but wondering if there is a quicker way to do this given that i may need to repeat the process:
load("data_1.rda")
load("data_2.rda")
load("data_3.rda") ...
load("data_2000.rda")
my.list <- list()
my.list <- c(my.list, data.1, data.2, data.3, ... , data.2000)
save(my.list, file="my_list.rda")
And just to add to things i'm getting an error when doing this:
Error: attempt to set index 18446744071562067968/2877912830 in SET_STRING_ELT
It's not a very helpful error message All the rdas load as objects into the environment fine, but when i try and concatenate them that is when I get the error message, and it seems like it is when it gets to a particular point as it doesn't fail immediately. Wasn't sure if it is some sort of limit in the number of concatenations you can do or rogue data, but troubleshooting it it appears to be syntax rather than data related. Have chunked it up into 5 batches and then doing a final concatenation before saving the rda. Have seen other answers for this sort of thing suggesting using rbind, mget, and do.Call or list function - would using any of these functions make it faster and achieve the same thing? Something like this:
my.list <- do.call(rbind, mget(ls(pattern="^data_")))
Thanks