0

I'm trying to spider a page for links with a specific CSS class with Selenium for Python 3. For some reason it just stops, when it should loop through again

def spider_me_links(driver, max_pages, links):

    page = 1  # NOTE: Change this to start with a different page.
    while page <= max_pages:
        url = "https://www.example.com/home/?sort=title&p=" + str(page)
        driver.get(url)

        # Timeout after 2 seconds, and duration 5 seconds between polls.
        wait = WebDriverWait(driver, 120, 5000)

        wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'card-details')))

        # Obtain source text
        source_code = driver.page_source

        soup = BeautifulSoup(source_code, 'lxml')

        print("findAll:", len(soup.findAll('a', {'class' : 'card-details'}))) # returns 12 at every loop iteration.
        links += soup.findAll('a', {'class' : 'card-details'})

        page += 1

The two lines I think I have it wrong on are the following:

# Timeout after 2 seconds, and duration 5 seconds between polls.
wait = WebDriverWait(driver, 120, 5000)

wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'card-details')))

Because during that point I'm waiting for content to be loaded dynamically with Ajax, and the content loads fine. If I don't use the function to load it and I don't run the above two lines, I'm able to grab the <a> tags, but if I put it in the loop it just gets stuck.

I looked at the documentation for the selenium.webdriver.support.expected_conditions class (the EC object in my code above), and I'm fairly unsure about which method I should use to make sure the content has been loaded before scraping it with BS4.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
leeand00
  • 23,306
  • 34
  • 125
  • 265

3 Answers3

1

Usually creditcard name, creditcard numbers resides within <frame> / <iframe>

To focus on those elements, you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • You can use either of the following Locator Strategies:

    • Using ID:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"iframe_id")))
      
    • Using NAME:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.NAME,"iframe_name")))
      
    • Using CLASS_NAME:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CLASS_NAME,"iframe_classname")))
      
    • Using CSS_SELECTOR:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe_css")))
      
    • Using XPATH:

      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"iframe_xpath")))
      
  • Note: You have to add the following imports:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Reference

You can find a couple of relevant discussions in:

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
DebanjanB
  • 118,661
  • 30
  • 168
  • 217
  • Thank you for your effort in answering this question; very often it is the fact that an ` – leeand00 Dec 14 '20 at 19:04
0

Your selector "means" that you want to select element with tag name 'card-details' while you need to select element with @class='card-details'

Try either

(By.CSS_SELECTOR, '.card-details')

or

(By.CLASS_NAME, 'card-details')
Parolla
  • 267
  • 6
0

I ended up using:

    wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'card-details')))

And it appears to have worked.

leeand00
  • 23,306
  • 34
  • 125
  • 265