Rvest: not all rows are scraped

Asked May 15 '18 at 07:32

Active Feb 02 '21 at 11:30

Viewed 77 times

I want to scrape HTML table in R using rvest package. It works, but I have one problem: not all rows are scraped. For this example, I am using data from Yahoo! Finance. Following are my codes:

library("rvest")

# I use AAPL as an example
# Time period: Jan 1, 2012 - May 14, 2018

url = 'https://finance.yahoo.com/quote/AAPL/history?period1=1325350800&period2=1526230800&interval=1d&filter=history&frequency=1d'

df = url %>%
  read_html() %>%
  html_nodes("table") %>%
  html_table()
df = data.frame(df[[1]])
nrow(df)

The problem emerges when I see the total numbers of rows, which are only 101 (Dec 20, 2017 - May 11, 2018). What am I missing?

Thank you.

edited Feb 02 '21 at 11:30

barny

5,280
4
16
21

asked May 15 '18 at 07:32

lukmanedwindra

So how many rows do you want to scrape? It looks like there are many rows which keeps on populating as you scroll. – Ronak Shah May 15 '18 at 07:50
You must dynamically scroll through the page to be able to scrape more. You can do this with `RSelenium`. Check [this answer](https://stackoverflow.com/a/29965233/9446220) – Shique May 15 '18 at 08:02
Hi, thanks for the answers! I didn't find the same questions with the keywords that I used. Should've used another. – lukmanedwindra May 15 '18 at 08:11

Rvest: not all rows are scraped

0 Answers0