0
# importing package

from selenium import webdriver

# setting the path

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
options = webdriver.ChromeOptions
options.headless = True

driver.get("https://www.craispesaonline.it/provincia/treviso")

# x path for Address and Postal Code

x = ('//address//p[@class="text-lowercase m-0 ng-binding"]')
search = driver.find_elements_by_xpath(x)

# retrieving the output in a text file

with open("Italy_Scrape.txt", "a") as f:
    for i in search:
        print("PostalCode :" + i.text, file=f)
        print("----------------------------------------------------------------------------", file=f)

driver.quit()

CODE TO GET THE POSTAL ADDRESS. In the above code, I am using selenium which chrome headless. Need the code to get the postal code only for stores for which delivery is available.

DebanjanB
  • 118,661
  • 30
  • 168
  • 217

3 Answers3

1

The page is taking time to load completely hence you are unable to get the values you after.

To get all postal codes Induce WebDriverWait() and wait for visibility_of_all_elements_located()

To get the last child from an element you can induce javascript executor or splitlines to get the only postcode.

driver.get("https://www.craispesaonline.it/provincia/treviso")
search=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH,'//address//p[@class="text-lowercase m-0 ng-binding"]')))
for postcode in search:
    print(driver.execute_script('return arguments[0].lastChild.textContent;', postcode))

You need to import below libraries.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Console Output:

0422/710092
 0422 452388
 0422/958833
 0423/689003
 0422/853881
 0422/969047
 0423/564126
 0423/650073
 0423/723434
 0423/942150
 0438/500484
 0423/868496
 0438/898282
 0483801679
 0422/832603
 0423/470063
 0423/755164-23
 0438/492409
 0438/893369
 0422/791529
 0423/302959
 0423/301381
 0423-603754
 0423/609936
 0423/609151
 0423480340
 0438/781107
 0423/670593
 0423/81743
 0423/81534
 0423/972091
 0423/451941
 0422/912384
 0423/620803
 0423/621383

Same output using splitlines()

driver.get("https://www.craispesaonline.it/provincia/treviso")
search=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH,'//address//p[@class="text-lowercase m-0 ng-binding"]')))
for postcode in search:
    print(postcode.text.splitlines()[-1].split("|")[-1].strip()) #last element which is postcode
KunduK
  • 26,790
  • 2
  • 10
  • 32
1

To complete the preceding answers, you can get the postcodes of stores where delivery is possible with one single XPath expression :

//div[@class="row province-cms-content-store-row ng-scope"][./div[@ng-if="store.shippingEnabled == true"]]//meta[@itemprop="postalCode"]/@content

Selenium code :

driver.get("https://www.craispesaonline.it/provincia/treviso")
postcodes = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH,'//div[@class="row province-cms-content-store-row ng-scope"][./div[@ng-if="store.shippingEnabled == true"]]//meta[@itemprop="postalCode"]'))).get_attribute("content")

Output : 29 postcodes

['31038']
['31038']
['31047']
['31050']
['31030']
...
E.Wiest
  • 5,122
  • 2
  • 4
  • 11
0

To extract the postal code only for the stores for which delivery is available you to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following based Locator Strategy:

  • Using CSS_SELECTOR:

    driver.get("https://www.craispesaonline.it/provincia/treviso")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='cl-accept']"))).click()
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h2[contains(., 'Potrai scegliere di ricevere la tua spesa in due modi:')]"))))
    addresses = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//input[@value='Consegna']//preceding::address[1]//p[@class='text-lowercase m-0 ng-binding']")))]
    for address in addresses:
        print(re.findall(r"\b\d{5}\b", address))
    
  • Console Output:

    ['31038']
    ['31038']
    ['31047']
    ['31050']
    ['31030']
    ['31031']
    ['31034']
    ['31014']
    ['31035']
    ['31010']
    ['31010']
    ['31036']
    ['31037']
    ['31037']
    ['31050']
    ['31050']
    ['31044']
    ['31044']
    ['31044']
    ['31044']
    ['31044']
    ['31023']
    ['31058']
    ['31040', '81743']
    ['31049']
    ['31050']
    ['31020']
    ['31040']
    ['31040']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
DebanjanB
  • 118,661
  • 30
  • 168
  • 217