I'm trying to scrape data from this review site. It first go through first page, check if there's a 2nd page then go to it too. Problem is when getting to 2nd page. Page takes time to update and I still get the first page's data instead of 2nd
For example, if you go here, you will see how it takes time to load page 2 data
I tried to put a timeout or sleep but didn't work. Prefer a solution with minimal package/browser dependency (like webdriver.PhantomJS()
) as I need to run this code on my employer's environment and not sure if I can use it. Thank you!!
from urllib.request import Request, urlopen
from time import sleep
from socket import timeout
req = Request(softwareadvice, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req, timeout=10).read()
webpage = web_byte.decode('utf-8')
parsed_html = BeautifulSoup(webpage, features="lxml")
true=parsed_html.find('div', {'class':['Grid-cell--1of12 pagination-arrows pagination-arrows-right']})
while(true):
true = parsed_html.find('div', {'class':['Grid-cell--1of12 pagination-arrows pagination-arrows-right']})
if(not True):
true=False
else:
req = Request(softwareadvice+'?review.page=2', headers=hdr)
sleep(10)
webpage = urlopen(req, timeout=10)
sleep(10)
webpage = webpage.read().decode('utf-8')
parsed_html = BeautifulSoup(webpage, features="lxml")