1

I want to scrape HTML table in R using rvest package. It works, but I have one problem: not all rows are scraped. For this example, I am using data from Yahoo! Finance. Following are my codes:

library("rvest")

# I use AAPL as an example
# Time period: Jan 1, 2012 - May 14, 2018

url = 'https://finance.yahoo.com/quote/AAPL/history?period1=1325350800&period2=1526230800&interval=1d&filter=history&frequency=1d'

df = url %>%
  read_html() %>%
  html_nodes("table") %>%
  html_table()
df = data.frame(df[[1]])
nrow(df)

The problem emerges when I see the total numbers of rows, which are only 101 (Dec 20, 2017 - May 11, 2018). What am I missing?

Thank you.

barny
  • 5,280
  • 4
  • 16
  • 21
  • So how many rows do you want to scrape? It looks like there are many rows which keeps on populating as you scroll. – Ronak Shah May 15 '18 at 07:50
  • You must dynamically scroll through the page to be able to scrape more. You can do this with `RSelenium`. Check [this answer](https://stackoverflow.com/a/29965233/9446220) – Shique May 15 '18 at 08:02
  • Hi, thanks for the answers! I didn't find the same questions with the keywords that I used. Should've used another. – lukmanedwindra May 15 '18 at 08:11

0 Answers0