My goal is to get a list of the names of all the new items that have been posted on https://www.prusaprinters.org/prints during the full 24 hours of a given day.
Through a bit of reading I've learned that I should be using Selenium because the site I'm scraping is dynamic (loads more objects as the user scrolls).
Trouble is, I can't seem to get anything but an empty list from webdriver.find_elements_by_
with any of the suffixes listed at https://selenium-python.readthedocs.io/locating-elements.html.
On the site, I see "class = name"
and "class = clamp-two-lines"
when I inspect the element I want to get the title of (see screenshot), but I can't seem to return a list of all the elements on the page with that name
class or the clamp-two-lines
class.
Here's the code I have so far (the lines commented out are failed attempts):
from timeit import default_timer as timer
start_time = timer()
print("Script Started")
import bs4, selenium, smtplib, time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(r'D:\PortableApps\Python Peripherals\chromedriver.exe')
url = 'https://www.prusaprinters.org/prints'
driver.get(url)
# foo = driver.find_elements_by_name('name')
# foo = driver.find_elements_by_xpath('name')
# foo = driver.find_elements_by_class_name('name')
# foo = driver.find_elements_by_tag_name('name')
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[class*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=clamp-two-lines]')]
# foo = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="printListOuter"]//ul[@class="clamp-two-lines"]/li')))
print(foo)
driver.quit()
print("Time to run: " + str(round(timer() - start_time,4)) + "s")
My research:
- Selenium only returns an empty list
- Selenium find_elements_by_css_selector returns an empty list
- Web Scraping Python (BeautifulSoup,Requests)
- Get HTML Source of WebElement in Selenium WebDriver using Python
- How to get Inspect Element code in Selenium WebDriver
- Web Scraping Python (BeautifulSoup,Requests)
- https://chrisalbon.com/python/web_scraping/monitor_a_website/
- https://www.codementor.io/@gergelykovcs/how-and-why-i-built-a-simple-web-scrapig-script-to-notify-us-about-our-favourite-food-fcrhuhn45
- https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_dynamic_websites.htm