0

I am setting up an R script to scrape data from homedepot.com. It is going fine, except that I would like to scrape the stock levels for products, which requires setting the local store. I have tried a few ways to do this using rvest without success. How can I set the local store on homedepot.com? I have found these related questions that have not led me to a solution: (R language ) How to make a click on webpage using rvest or rcurl

Submit form with no submit button in rvest

How to properly set cookies to get URL content using httr

More info: - the store location code seems to be stored in a cookie called THD-LOC-STORE, with a 4-digit store ID. I have been unsuccessful in setting this cookie:

library("rvest")
library("httr")
# try to set cookie in site with store ID:
session <- html_session("http://www.homedepot.com", set_cookies('THD-LOC-STORE'='2679'))
# if this worked, it would show the store name instead of "Select a Store":
storefinder <- session %>% read_html() %>% html_nodes(".headerStoreFinder") %>% html_text() %>% gsub("\\t","",.)
storefinder
cookies(session)

I also thought about using submit_form() in rvest, but the buttons to select a store are run by javascript and there are no SUBMIT buttons to choose.

Community
  • 1
  • 1
Scott
  • 121
  • 7
  • Scraping home depot is a violation of their T&Cs and they have an extensive robots.txt file which — thanks to LinkedIn and a few more cases in 2016/7 — is nearly an official technical control that bypassing violates CFAA. – hrbrmstr Dec 23 '17 at 03:35

1 Answers1

2

Concerning your possible option "I also thought about using submit_form() in rvest, but the buttons to select a store are run by javascript and there are no SUBMIT buttons to choose", I posted an answer to the question "Submit form with no submit button in rvest" which might provide this solution for your.

In brief, you can inject a submit button into your version of the code and then submit that. Details of how to do that are in the linked post.

Community
  • 1
  • 1
Tripartio
  • 1,617
  • 1
  • 20
  • 26
  • Thanks for the reply. My problem really ended up being that rvest downloads a page before any javascript runs, so I switched to using casperjs to download the loaded page and then scrape that. Your method may prove useful in the future, though. – Scott Jul 18 '16 at 13:51
  • Could you please post the solution that worked for you as an answer to your own question, and then accept your answer as the accepted answer? This is not only perfectly OK on Stack Overflow, but this is recommended so that you can share you knowledge with others. – Tripartio Jul 18 '16 at 14:38