Scraping a dynamic ecommerce page with infinite scroll

Question

I'm using rvest in R to do some scraping. I know some HTML and CSS.

I want to get the prices of every product of a URI:

http://www.linio.com.co/tecnologia/celulares-telefonia-gps/

The new items load as you go down on the page (as you do some scrolling).

What I've done so far:

Linio_Celulares <- html("http://www.linio.com.co/celulares-telefonia-gps/")

Linio_Celulares %>%
  html_nodes(".product-itm-price-new") %>%
  html_text()

And i get what i need, but just for the 25 first elements (those load for default).

 [1] "$ 1.999.900" "$ 1.999.900" "$ 1.999.900" "$ 2.299.900" "$ 2.279.900"
 [6] "$ 2.279.900" "$ 1.159.900" "$ 1.749.900" "$ 1.879.900" "$ 189.900"  
[11] "$ 2.299.900" "$ 2.499.900" "$ 2.499.900" "$ 2.799.000" "$ 529.900"  
[16] "$ 2.699.900" "$ 2.149.900" "$ 189.900"   "$ 2.549.900" "$ 1.395.900"
[21] "$ 249.900"   "$ 41.900"    "$ 319.900"   "$ 149.900"

Question: How to get all the elements of this dynamic section?

I guess, I could scroll the page until all elements are loaded and then use html(URL). But this seems like a lot of work (i'm planning of doing this on different sections). There should be a programmatic work around.

You would need to use XPath (in R or outside of R) -- have a look at the `XML` package. — Hack-R, Apr 25 '15 at 18:41
It can't be done with Rvest? I've seen that Rvest imports XML. I've read somestuff about XML. But i'm the URL in my example, i don't see this meta tags from XML. May you help me out? — Omar Gonzales, Apr 26 '15 at 04:06
Here, I think maybe this will help you do it in `rvest`: http://stackoverflow.com/questions/27812259/following-next-link-with-relative-paths-using-rvest — Hack-R, Apr 26 '15 at 14:37
@Hack-R. I've seen your example, but what i have is a little different. In my example, there isn't a "Next" button or "Page 2". However, i see a :"
Página 4
" (this changes from 2 to X) that activates as i do scrolling.Would be nice if you have any other tip. — Omar Gonzales, Apr 26 '15 at 16:49
@OmarGonzales You may have to look into `RSelenium` to achieve this - see [this related post](http://stackoverflow.com/questions/26692227/web-scraping-with-r). — nrussell, Apr 28 '15 at 20:11
I have been to many links but people redirect finally to selenium, How on earth it is not possible in rvest or any R package to activate an infinite scroll page and scrape the final scroll included? Could we invoke @hadley to help here. — Lazarus Thurston, Jan 11 '20 at 11:39

score 25 · Accepted Answer · answered Apr 30 '15 at 10:24

25

As @nrussell suggested, you can use RSelenium to programatically scroll down the page before getting the source code.

You could for example do:

library(RSelenium)
library(rvest)
#start RSelenium
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()

#navigate to your page
remDr$navigate("http://www.linio.com.co/tecnologia/celulares-telefonia-gps/")

#scroll down 5 times, waiting for the page to load at each time
for(i in 1:5){      
remDr$executeScript(paste("scroll(0,",i*10000,");"))
Sys.sleep(3)    
}

#get the page html
page_source<-remDr$getPageSource()

#parse it
html(page_source[[1]]) %>% html_nodes(".product-itm-price-new") %>%
  html_text()

answered Apr 30 '15 at 10:24

NicE

19,725
3
39
62

i've been learning some Javascript, but I don't get the for loop you have used. Could you point me to a document on this please? – Omar Gonzales Sep 04 '15 at 14:49
this is an R `for` loop rather than a javascript one, some info [here](http://paleocave.sciencesortof.com/2013/03/writing-a-for-loop-in-r/) – NicE Sep 06 '15 at 13:18
thanks, but I was talking about the
```
scroll(0,"i*10000,")
```
I've heard that the "scroll" command is used in Javascript (like this one: click, hover, etc). 2.- Why 'i*10000'? Is it: for every loop, scroll 10,000 pixels?
– Omar Gonzales Sep 06 '15 at 16:04
I tried doing the same code as above but it gives me "character(0)".. Why is it so?? – deepesh Jun 23 '17 at 08:26
this is now outdated, it appears to use Docker instead – Laurence_jj Feb 15 '21 at 16:12

Bharath · Answer 2 · 2019-07-05T17:31:26.043

-1

library(rvest)
url<-"https://www.linio.com.co/c/celulares-y-tablets?page=1"
page<-html_session(url)

html_nodes(page,css=".price-secondary") %>% html_text()

Loop through the website https://www.linio.com.co/c/celulares-y-tablets?page=2 and 3 and so on and it will be easy for you to scrape the data

EDIT dated 07/05/2019

The website elements changed. Hence new code

library(rvest)
url<-"https://www.linio.com.co/c/celulares-y-tablets?page=1"
page<-html_session(url)

html_nodes(page,css=".price-main") %>% html_text()

edited Jul 05 '19 at 17:31

answered Dec 19 '18 at 22:03

Bharath

1,401
12
21

lineo changed it's url structure, not, as you say, is easy to scrap their products. Not in 2015. – Omar Gonzales Jun 26 '19 at 17:19
Yeah they have changed the css element alone. It still works with this code @OmarGonzales `library(rvest) url% html_text()` – Bharath Jul 05 '19 at 17:28

Scraping a dynamic ecommerce page with infinite scroll

2 Answers2

Linked

Related