0

I am trying to extract text from this element:

<div class="_pac" data-bt="{&quot;ct&quot;:&quot;sub_headers&quot;}"><a href="https://www.facebook.com/pages/%EB%B6%81%EC%9D%BC%EC%97%AC%EC%9E%90%EA%B3%A0%EB%93%B1%ED%95%99%EA%B5%90/110634532291267">북일여자고등학교</a><div class="_1my"></div></div>

I am trying to extract the text after href -

'북일여자고등학교'

So far I tried:

content = driver.find_element_by_css_selector('div._pac')

for i in content:
 i.get_attribute('text')

However, it is not returning anything. How can I extract the text?

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
song0089
  • 2,381
  • 7
  • 35
  • 60

4 Answers4

0

The desired text 북일여자고등학교 is within a child <a> node within the parent <div> node.

To print the text 북일여자고등학교 you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and .get_attribute("innerHTML"):

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div._pac>a[href^='https://www.facebook.com/pages']"))).get_attribute("innerHTML"))
    
  • Using XPATH and text attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='_pac']/a[starts-with(@href, 'https://www.facebook.com/pages')]"))).text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output of two back to back execution:

    북일여자고등학교
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


Outro

Link to useful documentation:

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
0

It seems there are multiple div with the class _pac on you page. In your case it is locating first element and that has no text.

find_element methods locates the verify first element in case multiple occurrence is there of same element.

Try to make you locator more unique and specific to locate the single element. Refer below code :

content = driver.find_element_by_xpath("//div[contains(@data-bt,'sub_headers')]/a")
content.text

It seems you are looping the content but you have used find_element instead of find_elements so replace it with find_elements method

loop through all the text present in same kind of element use below code:

content = driver.find_elements_by_css_selector('div._pac')
for element in content:
   print(element.text)
NarendraR
  • 6,770
  • 7
  • 35
  • 69
0

To extract text from <a> tag like you mean, use this css selector div._pac > a. Please try this solution:

content = driver.find_element_by_css_selector('div._pac > a')
print(content.text)

#or use '.get_attribute'
print(content.get_attribute("innerHTML"))

If there are multiple elements with same classification on the page, you can use .find_elements_*, it will return a list of webelemet, and extract them with loop:

content = driver.find_elements_by_css_selector('div._pac > a')
for el in content:
    print(el.text)

    #or use '.get_attribute'
    print(el.get_attribute("innerHTML"))
frianH
  • 5,901
  • 6
  • 13
  • 36
0

Just remember to do that extraction BEFORE closing driver!
I had that issue because I had loop after driver.close() even when my variable element was filled with data!

So loop BEFORE .close().

Example:

driver = webdriver.Chrome()
...
LOOP for(..)...
...loop does smth...
driver.close()
codemonkey
  • 5,572
  • 3
  • 17
  • 30
Noone
  • 13
  • 4