2

I am using rvest to do some webscraping, and I am training on tripadvisor. I don't manage to set a radio button to the proper value in order to have all comments :

library(rvest)
url <- "https://www.tripadvisor.com/Restaurant_Review-g187438-d12699400-Reviews-Trattoria_Mamma_Franca-Malaga_Costa_del_Sol_Province_of_Malaga_Andalucia.html"
session <- html_session(url)
pgform <- html_form(session)[[3]]

which gives the form

<form> 'taplc_location_review_filter_controls_0_form' (POST /SetReviewFilter#REVIEWS)
  <input checkbox> 'filterRating': 5
  <input checkbox> 'filterRating': 4
  <input checkbox> 'filterRating': 3
  <input checkbox> 'filterRating': 2
  <input checkbox> 'filterRating': 1
  <input hidden> 'filterRating': 
  <input checkbox> 'filterSegment': 3
  <input checkbox> 'filterSegment': 2
  <input checkbox> 'filterSegment': 5
  <input checkbox> 'filterSegment': 1
  <input checkbox> 'filterSegment': 4
  <input hidden> 'filterSegment': 
  <input checkbox> 'filterSeasons': 1
  <input checkbox> 'filterSeasons': 2
  <input checkbox> 'filterSeasons': 3
  <input checkbox> 'filterSeasons': 4
  <input hidden> 'filterSeasons': 
  <input radio> 'filterLang': ALL
  <input radio> 'filterLang': en
  <input radio> 'filterLang': es
  <input radio> 'filterLang': it
  <input radio> 'filterLang': fr
  <input radio> 'filterLang': nl
  <input radio> 'filterLang': ru
  <input radio> 'filterLang': sv
  <input radio> 'filterLang': da
  <input radio> 'filterLang': de
  <input radio> 'filterLang': no
  <input radio> 'filterLang': pl
  <input radio> 'filterLang': pt
  <input hidden> 'returnTo': #REVIEWS

I would like to set filterLang to ALL

filledform <- set_values(pgform,
                         filterLang = "ALL")
submit_form(session,filledform)

gives me the error:

Error: Could not find possible submission target.

What submission should I use ? Can I use rvest, or should I try something like this ?

denis
  • 4,710
  • 1
  • 8
  • 33

1 Answers1

1

The error message you are getting is not related to the radio buttons but rather to the fact that the form you are trying to submit is lacking a submit button which rvest requires when trying to submit a form.

As a workaround for your example you can change the field type of the field returnTo to submit and set its value to the URL of the page itself, just like this:

pgform$fields[['returnTo']]$type = 'submit'
pgform$fields[['returnTo']]$value = url

Then you can set the language option as expected so things like

filledform <- set_values(pgform, filterLang = 'it')

or

filledform <- set_values(pgform, filterLang = 'ALL')

should work to set the language filter to Italian, or all languages, respectively.

Similarly as described here, when you do something like this

url <- 'https://www.tripadvisor.com/Restaurant_Review-g187438-d12699400-Reviews-Trattoria_Mamma_Franca-Malaga_Costa_del_Sol_Province_of_Malaga_Andalucia.html'
session <- html_session(url)
pgform <- html_form(session)[[3]]
pgform$fields[['returnTo']]$type = 'submit'
pgform$fields[['returnTo']]$value = url
filledform <- set_values(pgform, filterLang = 'ALL')
result <- submit_form(session, filledform)

you would rather get the whole page whereas you would get only the content using the following code

url <- 'https://www.tripadvisor.com/Restaurant_Review-g187438-d12699400-Reviews-Trattoria_Mamma_Franca-Malaga_Costa_del_Sol_Province_of_Malaga_Andalucia.html'
session <- html_session(url)
pgform <- html_form(session)[[3]]
pgform$fields[['returnTo']]$type = 'submit'
pgform$fields[['returnTo']]$value = url
filledform <- set_values(pgform, filterLang = 'ALL')
result <- submit_form(session, filledform, submit = NULL, httr::add_headers('x-requested-with' = 'XMLHttpRequest'))

Since you are trying to interact with a rather complex website which makes heavy use of JavaScript and XMLHttpRequest you might be better off switching from rvest to an approach with better support for such technologies, such as RSelenium.

martin_joerg
  • 1,124
  • 1
  • 14
  • 19
  • I will have a look to `RSelenium`. `rvest` just looked easier to start with. Do you have a good starting point or doc to advise ? – denis Jan 14 '19 at 09:04
  • I have an additional question : why the main form find restaurant doesn't appear in html_form for "https://www.tripadvisor.com/Restaurants" ? How do you access it with `rvest` ? – denis Jan 14 '19 at 09:37
  • As far as I can see the site you mentioned does not contain any form for searching and submitting, so I presume these actions are solely implemented using JavaScript. Selenium also avoids that you have to care for such implementation details. – martin_joerg Jan 14 '19 at 14:51
  • The official documentation of RSelenium provides a good introduction to its [basic usage](https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html). Apart from that, I would also suggest the documentation of Selenium to get a grasp of its [concepts](https://www.seleniumhq.org/docs/01_introducing_selenium.jsp) and understand what makes it different from the approach that tools like `rvest` use. – martin_joerg Jan 14 '19 at 15:00