0

I am having dataframe (dataframexml) which have 3 cols- Name, Path and URL and mutiple rows.Based on the URL, I am parsing the XML in R and creating a dataframe using getdataframe() function.So, based on the number of URLs, that many dataframes will be generated. (All the dataframes have same columns)

Now I need to add a new column to each dataframe which will have dataframe name in all rows and append one dataframe over another dynamically a create the master dataframe. This is part where I am stuck. Looking for some guidance.

Code:

for (i in 1:nrow(dataframexml)){

dataURL<- dataframexml[i,3]

dataURL.response<-GET(dataURL,authenticate("string","xxxxx"))

assign(paste("df",substr(dataframexml[i,3],85,100),sep=""),
getdataframe(dataURL.response))
# getdataframe() = A function to create dataframe from the URL

# parts stuck 
# "1st <- create a new column which will have dataframe name in all rows"
# "2nd" <- append one dataframe over another and create a master dataframe 

print(paste("df",substr(dataframexml[i,3],85,100),sep=""))
# For Testing
}
Community
  • 1
  • 1
string
  • 637
  • 9
  • 32
  • In the long run, it's probably most useful to use a version of `lapply`/`Map` and keep the resulting data.frames in a list. – alistaire Apr 21 '17 at 15:39
  • Does using lapply/Map will help to improve the performance as compare to loops as sometime I need to run thousands of Urls – string Apr 21 '17 at 16:18
  • It does better than an unpreallocated `for` loop, but given you won't really know how much memory you need in this scenario, there's not really a way to avoid the slowness. The advantage is really afterwards, when you have a list of thousands of objects which you can easily iterate across instead of thousands of unconnected objects in your global environment. [Here's a longer explanation.](http://stackoverflow.com/a/24376207/4497050) – alistaire Apr 21 '17 at 16:24
  • Also, `purrr::map2_df` might solve all your problems here. – alistaire Apr 21 '17 at 16:26

1 Answers1

0

Here's one way to do that in a loop. Basically, you create an empty object before your loop, which will be used to store the result. Then, in the loop, use cbind to add the name and finally rbind to add all the results on top of another.

dataframexml <- data.frame(name=c("a","b"),url=c("http1","http2"))

res <- NULL #create empty object
for (i in 1:nrow(dataframexml)){

name_loop <- dataframexml[i,"name"]
getdataframe_result <- data.frame(col1=runif(2),col2=runif(2))
getdataframe_result_with_name <- cbind(name_loop,getdataframe_result)
res <- rbind(res,getdataframe_result_with_name)
}

> res
  name_loop     col1      col2
1         a 0.267059 0.4765398
2         a 0.730072 0.4079391
3         b 0.131630 0.7102743
4         b 0.678059 0.0624137
Pierre Lapointe
  • 14,914
  • 2
  • 31
  • 52