3

I'm trying to download [the full] dynamically expanded [holdings] table using rvest, but am getting an Unknown field names error.

s <- html_session("http://innovatoretfs.com/etf/?ticker=ffty")
f <- html_form(s)[[1]]
#the following line fails:
f.new <- set_values(f, `__EVENTTARGET` = "ctl00$BodyPlaceHolder$ViewHoldingsLinkButton")

##subsequent lines are not tested##
doc <- submit_form(s, f.new)
tabs <- xml_find_all(doc, "//table")
holdings <- html_table(tabs, fill = T, trim = T)[[5]]

I'm not great with HTML/HTTP but from what i can chase through, it seems to me that to expand the table requires a postback of the form with this new field value set

after inspecting the set_values function, it seems that it only allows existing fields to be assigned values.

is there any way to add a new field to a form under rvest? If not, is anyone ware of another package I could use to get this functionality?

[edited] to be very explicit that i need the full version of the dynamically expanded table and to add expected subsequent table extraction code

Ethan
  • 302
  • 1
  • 10
  • Are you trying to scrape the ETF table over time? (The Growth of $10,000 table) – papelr Jul 15 '18 at 22:44
  • You could also try RSelenium to scrape the table(s) – papelr Jul 15 '18 at 22:48
  • @papelr no. i'm trying to scrape the FULL holdings table – Ethan Jul 15 '18 at 22:56
  • 1
    @papelr thank for the pointer to [RSelenium](https://cran.r-project.org/web/packages/RSelenium/). unfortunately, it looks like its on a unmaintained path – Ethan Jul 15 '18 at 23:06
  • I would take the solution below and ask another SO question on how to get the full table... But I'm also like 80% certain someone is gonna tell you to go the RSelenium route – papelr Jul 15 '18 at 23:56
  • The fact that RSelenium did not work ticked me off so much I posted another question: https://stackoverflow.com/questions/51353272/rselenium-scraping-a-full-expandable-table @Ethan – papelr Jul 16 '18 at 00:30

2 Answers2

1

DISGUSTING, BUT WORKS could probably be cleaned up, but will submit an issue to the project for a proper fix for add_values type functionality

getInnovatorHoldings <- function() {
    s <- html_session("http://innovatoretfs.com/etf/?ticker=ffty")
    f <- html_form(s)[[1]]
    f.new <- add_values(f,
                            `__EVENTTARGET` = "ctl00$BodyPlaceHolder$ViewHoldingsLinkButton",
                            `__EVENTARGUMENT` = "",
                            `submit` = NULL)

    s <- submit_form(s, f.new, "submit")
    doc <- read_html(s)
    tabs <- xml_find_all(doc, "//table")
    holdings <- html_table(tabs, fill = T, trim = T)[[5]]
    return(holdings)
}

add_values <- function(form, ...) {
    new_values <- list(...)
    no_match <- which(!names(new_values) %in% names(form$fields))
    for (n in no_match) {
        if (names(new_values[n]) == "submit") {
            form$fields[[names(new_values[n])]] <- new_input(name = names(new_values[n]), type = "submit", value = NULL)
        } else {
            form$fields[[names(new_values[n])]] <- new_input(name = names(new_values[n]), type = "hidden", value = new_values[n][[1]])
        }
    }
    return(form)
}

new_input <- function(name, type, value, checked = NULL, disabled = NULL, readonly = NULL, required = F) {
    return(
        structure(
            list(name = name,
                type = type,
                value = value,
                checked = checked,
                disabled = disabled,
                readonly = readonly,
                required = required
                ),
            class = "input"
        )
    )
}
Ethan
  • 302
  • 1
  • 10
  • I'm hoping for a wayyy simpler RSelenium fix...will update my answer when that comes around – papelr Jul 16 '18 at 02:38
  • borrowed submit button solution from https://stackoverflow.com/questions/33885629/submit-form-with-no-submit-button-in-rvest?rq=1 – Ethan Jul 16 '18 at 03:04
0

Answer: rvest

This solution works, but only returns the first 10 rows of the table:

library(tidyverse)
library(rvest)

ffty_url <- "http://innovatoretfs.com/etf/?ticker=ffty"

ffty_table <- ffty_url %>%
  read_html %>%
  html_table(fill = T) %>% 
  .[[5]]

Working on getting the full table, but that may not be possible using rvest because it is expandable. Honestly not sure.


Answer: RSelenium

You're going to have to install RSelenium and docker, and there are multiple tutorials on that. BUT the following code also only returns the first ten rows, which has me livid.

library(RSelenium)
library(rvest)

remDr <- remoteDriver(port = 4445L, remoteServerAddr = "localhost",
                  browserName = "chrome")
remDr$open()
remDr$navigate("http://innovatoretfs.com/etf/?ticker=ffty")
page <- read_html(remDr$getPageSource()[[1]])
table <- html_table(page, fill = TRUE, header = T)
table[[5]]

If anyone wants to expand on either sets of code, please...

papelr
  • 364
  • 9
  • 29