1

I would like to scrap data from the following webpage:

https://swgoh.gg/u/zozo/collection/180/emperor-palpatine/

When I want to access it, the website requires my login.

Here is my code:

library(rvest)

url <- 'https://swgoh.gg/u/zozo/collection/180/emperor-palpatine/'
session <- html_session(url)

<session> https://swgoh.gg/accounts/login/?next=/u/zozo/collection/180/emperor-palpatine/

Status: 200

Type: text/html; charset=utf-8

Size: 2081

form <- html_form(read_html(url))[[1]]

<form> '<unnamed>' (POST .)

<input hidden> 'csrfmiddlewaretoken': aFuZy6Pxjg10MqdZjis9vjgojDCxa3QT

<input text> 'username':

<input password> 'password':

<button> '<unnamed>

filled_form <- set_values(form,
                          username = "myusername",
                          password = "mypassword")
(result<-submit_form(session, filled_form))

Although my username and password works when I browse normally, I get the following error after running the last line:

Error: Could not find possible submission target.

I have already searched the web for a solution without success.

EDIT : The solution proposed by @Mr Flick did the trick. unfortunately, I get the following warning message :

Submitting with '<unnamed>'

Warning message:

In request_POST(session, url = url, body = request$values, encode => request$encode, : Forbidden (HTTP 403).

result gives :

<session> https://swgoh.gg/accounts/login/

Status: 403

Type: text/html; charset=utf-8

Size: 989

user124563
  • 13
  • 3

1 Answers1

3

The code that rvest uses to determine how to submit the form seems to be getting tripped up. it's not recognizing the generic "button" as the submit button. You can fool it in this case with

form$fields[[4]]$type <- "button"
filled_form <- set_values(form,
                          username = "myusername",
                          password = "mypassword")
submit_form(session, filled_form)
MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • Thanks for your helpful solution ! But I am now stuck at the next step with a 403 error (see the edit). – user124563 Jun 15 '18 at 20:54
  • Well, a 403 error means they have received the request but denied it. Many applications try to block scrapers. You can try hacking the user-agent string to see if they are looking at that to deny you. Also check your username and password. But the only people who can really say what's going on in that case are the application owners. You should contact that website directly for support. – MrFlick Jun 15 '18 at 21:12