Extract the breadcrumbs of a website using selenium

Question

i need to extract the breadcrumbs of this website site: https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas

I tried to inspect the element and copy the xpath but it doesn't extract it

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas')
driver.find_elements_by_xpath('//*[@id="center-panel"]/div/wow-tile-list-with-content/ng-transclude/wow-browse-tile-list/wow-tile-list/div/div[1]/div[1]/wow-breadcrumbs/div/ul/li[4]/span/span')

driver.find_element_by_css_selector('#center-panel > div > wow-tile-list-with-content > ng-transclude > wow-browse-tile-list > wow-tile-list > div > div.tileList > div.tileList-headerContainer > wow-breadcrumbs > div > ul > li:nth-child(4) > span > span')

How can I proceed?

I have an empty list for the xpath command and for the css selector i have : Message: Unable to locate element. — Melissa A, Sep 08 '19 at 14:34

score 1 · Answer 1 · answered Sep 08 '19 at 15:40

The page you are trying to scrape is written in Angular, meaning that most of the DOM elements are loaded dynamically by JavaScript AJAX code and they are not present once the page is loaded. (driver.get function returns)

You should use waits until function to locate such elements.

Here is the working example using the XPATH you provided:

driver.get('https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas')
try:
    element = WebDriverWait(driver, 1).until(
        EC.presence_of_element_located((By.XPATH, '//*[@id="center-panel"]/div/wow-tile-list-with-content/ng-transclude/wow-browse-tile-list/wow-tile-list/div/div[1]/div[1]/wow-breadcrumbs/div/ul/li[4]/span/span'))
    )
    print(element.text) ' this outputs Iced Teas
except TimeoutException:
    print("Timeout")

score 1 · Accepted Answer · answered Sep 08 '19 at 20:52

To print the breadcrumbs of the website site: https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas you have to induce WebDriverWait for the desired visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and get_attribute() method:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.breadcrumbs-linkList li:nth-child(4) span span"))).get_attribute("innerHTML"))

Using XPATH and text property:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='breadcrumbs-linkList']//following-sibling::li[4]//span//span"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Outro

As per the documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

score 0 · Answer 3 · answered Mar 05 '21 at 11:49

0

Below one works for my validation

//*[span='first text' and span='Search results for "second text"']

answered Mar 05 '21 at 11:49

Sukumar k

1

Hi Sukumar and Welcome to StackOverflow! Could you please format your code? You can read here on [how to format your code](https://meta.stackoverflow.com/questions/251361/how-do-i-format-my-code-blocks) and here on [how to write a good answer](https://stackoverflow.com/help/how-to-answer), thanks! – BiOS Mar 05 '21 at 13:42

Extract the breadcrumbs of a website using selenium

3 Answers3

Outro