2

i need to extract the breadcrumbs of this website site: https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas

I tried to inspect the element and copy the xpath but it doesn't extract it

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas')
driver.find_elements_by_xpath('//*[@id="center-panel"]/div/wow-tile-list-with-content/ng-transclude/wow-browse-tile-list/wow-tile-list/div/div[1]/div[1]/wow-breadcrumbs/div/ul/li[4]/span/span')

driver.find_element_by_css_selector('#center-panel > div > wow-tile-list-with-content > ng-transclude > wow-browse-tile-list > wow-tile-list > div > div.tileList > div.tileList-headerContainer > wow-breadcrumbs > div > ul > li:nth-child(4) > span > span')

How can I proceed?

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
Melissa A
  • 39
  • 3

3 Answers3

1

The page you are trying to scrape is written in Angular, meaning that most of the DOM elements are loaded dynamically by JavaScript AJAX code and they are not present once the page is loaded. (driver.get function returns)

You should use waits until function to locate such elements.

Here is the working example using the XPATH you provided:

driver.get('https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas')
try:
    element = WebDriverWait(driver, 1).until(
        EC.presence_of_element_located((By.XPATH, '//*[@id="center-panel"]/div/wow-tile-list-with-content/ng-transclude/wow-browse-tile-list/wow-tile-list/div/div[1]/div[1]/wow-breadcrumbs/div/ul/li[4]/span/span'))
    )
    print(element.text) ' this outputs Iced Teas
except TimeoutException:
    print("Timeout")
EylM
  • 5,196
  • 2
  • 13
  • 24
1

To print the breadcrumbs of the website site: https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas you have to induce WebDriverWait for the desired visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute() method:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.breadcrumbs-linkList li:nth-child(4) span span"))).get_attribute("innerHTML"))
    
  • Using XPATH and text property:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='breadcrumbs-linkList']//following-sibling::li[4]//span//span"))).text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

As per the documentation:

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
0

Below one works for my validation

//*[span='first text' and span='Search results for "second text"']

  • Hi Sukumar and Welcome to StackOverflow! Could you please format your code? You can read here on [how to format your code](https://meta.stackoverflow.com/questions/251361/how-do-i-format-my-code-blocks) and here on [how to write a good answer](https://stackoverflow.com/help/how-to-answer), thanks! – BiOS Mar 05 '21 at 13:42