0

I am trying to scrape some data off zappos.com. I want to scrape data every day at the same time and keep all this data in one data frame or file. How do I go about it?

My current database as of today looks like this. I just want the code to keep running every day at the same time and keep adding rows to this data frame.enter image description here

This is my code(A part of it):

webpage2 <- read_html("https://www.zappos.com/adidas- 
shoes/CK_XAVoBAeABAeICAwELHA.zso")
adidas_sale_count_html <- html_nodes(webpage2, '.selectedFacet 
._2TdLt')
adidas_sale_count <- html_text(adidas_sale_count_html)
adidas_sale_count <- as.character(adidas_sale_count)
adidas_sale_count <- gsub("[()]","", adidas_sale_count)
head(adidas_sale_count)

shoes <- data.frame(Date = Sys.Date(),Originals_on_Sale = 
adiori_sale_count, Originals_Total = adiori_total_count, Adidas_on_Sale 
= adidas_sale_count, Adidas_Total = adidas_total_count)

Any help will be appreciated. Thank you

Rushabh
  • 59
  • 7
  • You can write an R script which scrapes data and write it to an excel file in a folder, and you can schedule this script to run everyday depending on your os at a specific time everyday. – sm925 Sep 19 '18 at 21:20
  • 1
    Likely a duplicate question. To see how to schedule a script go to https://stackoverflow.com/questions/2793389/scheduling-r-script for Windows or https://stackoverflow.com/questions/30905934/how-to-schedule-an-r-script-cronjob-in-a-linux-server for linux. The only thing is that you would have to save the output somewhere in your script instead of leaving it in memory (a dataframe is in memory). – Adam Sampson Sep 19 '18 at 21:24
  • You either use a scheduler to run the script (and save the results) or you run an infinite loop in your script that checks the time and then uses Sys.sleep() to sleep until the next day. But if your computer crashes (like a power outage), the script would not auto-restart itself after the computer rebooted. – Adam Sampson Sep 19 '18 at 21:26
  • Clarification, the program would restart if you use a scheduler. It would not restart if you were using an infinite loop with a sleep. – Adam Sampson Sep 19 '18 at 21:27
  • @AdamSampson If I write an excel file, can I simply keep adding data to that excel file every day? If yes, how do I go about it? – Rushabh Sep 19 '18 at 21:40
  • 1
    use `write_csv(data .... , append=True)` from `readr` library at the end of your script. You can also use `fwrite` from `data.table` – Suhas Hegde Sep 20 '18 at 00:03
  • What you're actually trying to violate Zappos' terms of service and encouraging others to do so and risk legal action. – hrbrmstr Sep 20 '18 at 09:44

0 Answers0