22

I am attempting to use rvest to spider a webpage that requires an email/password login on a form.

rm(list=ls())
library(rvest)

### Trying to sign into a form using email/password 

url       <-"http://www.perfectgame.org/"   ## page to spider
pgsession <-html_session(url)               ## create session
pgform    <-html_form(pgsession)[[1]]       ## pull form from session

set_values(pgform, `ctl00$Header2$HeaderTop1$tbUsername` = "myemail@gmail.com") 
set_values(pgform, `ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")

submit_form(pgsession,pgform,submit=`ctl00$Header2$HeaderTop1$Button1`)

This gives me the following error message:

Error in submit_request(form, submit) : 

object 'ctl00$Header2$HeaderTop1$Button1' not found

If I submit the form without specifying the submit parameter, I get this:

Submitting with 'ctl00$Header2$HeaderTop1$Button1'
Error in function (type, msg, asError = TRUE)  : <url> malformed

I also tried passing the parameters directly to httr as mentioned in this question: How can I POST a simple HTML form in R?, but the "submit" parameter did not accept the submit button either with backwards quotes (``), quotation marks, or without any quotes:

library(httr)

url <- "http://www.perfectgame.org/Rankings/Players/Default.aspx?gyear=2015&num=500"

fd <- list(
    submit = `ctl00$Header2$HeaderTop1$Button1`,
    `ctl00$Header2$HeaderTop1$tbUsername`  = "myemail@gmail.com",
    `ctl00$Header2$HeaderTop1$tbPassword`  = "mypassword")

resp<-POST(url, body=fd, encode="form")
content(resp) 

Any ideas for how I can log in from an R session and spider the data that's behind the login wall?

Community
  • 1
  • 1
gbostock
  • 253
  • 1
  • 2
  • 6

1 Answers1

20

Your rvest code isn't storing the modified form, so in you're example you're just submitting the original pgform without the values being filled out. Try:

library(rvest)

url       <-"http://www.perfectgame.org/"   ## page to spider
pgsession <-html_session(url)               ## create session
pgform    <-html_form(pgsession)[[1]]       ## pull form from session

# Note the new variable assignment 

filled_form <- set_values(pgform,
  `ctl00$Header2$HeaderTop1$tbUsername` = "myemail@gmail.com", 
  `ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")

submit_form(pgsession,filled_form)

And I now see a nice 200 status code response instead of an error. Note that because the desired submit button appears to be the first submit button, we don't need to give it as an argument, but otherwise we'd just be giving it a a string (straight quotes, not back quotes).

cboettig
  • 11,132
  • 9
  • 63
  • 107
  • Hi, thanks very much for your response -- from your solution, I understand now that you have to save the filled form as an object in R rather than just pass it to the session. However, I cant seem to replicate your 200 status code -- I run the code above and I get the same error listed above `"Submitting with 'ctl00$Header2$HeaderTop1$Button1' Error in function (type, msg, asError = TRUE) : malformed"` Any idea what the difference might be? – gbostock Mar 25 '15 at 20:17
  • EDIT: I uploaded to the newest version of R (3.1.3) and receive the same message. Thanks! I will take a look and report back. – gbostock Mar 25 '15 at 20:30
  • 6
    Great answer! How would you navigate in this session with `rvest`? – Carol.Kar Dec 29 '15 at 12:22