Extract text under div class using Selenium and Python

Question

I am trying to extract text from this element:

<div class="_pac" data-bt="{&quot;ct&quot;:&quot;sub_headers&quot;}"><a href="https://www.facebook.com/pages/%EB%B6%81%EC%9D%BC%EC%97%AC%EC%9E%90%EA%B3%A0%EB%93%B1%ED%95%99%EA%B5%90/110634532291267">북일여자고등학교</a><div class="_1my"></div></div>

I am trying to extract the text after href -

'북일여자고등학교'

So far I tried:

content = driver.find_element_by_css_selector('div._pac')

for i in content:
 i.get_attribute('text')

However, it is not returning anything. How can I extract the text?

score 0 · Answer 1 · answered Jun 11 '20 at 03:57

The desired text 북일여자고등학교 is within a child <a> node within the parent <div> node.

To print the text 북일여자고등학교 you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and .get_attribute("innerHTML"):

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div._pac>a[href^='https://www.facebook.com/pages']"))).get_attribute("innerHTML"))

Using XPATH and text attribute:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='_pac']/a[starts-with(@href, 'https://www.facebook.com/pages')]"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output of two back to back execution:
```
북일여자고등학교
```

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

Outro

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

score 0 · Accepted Answer · answered Jun 11 '20 at 06:12

It seems there are multiple div with the class _pac on you page. In your case it is locating first element and that has no text.

find_element methods locates the verify first element in case multiple occurrence is there of same element.

Try to make you locator more unique and specific to locate the single element. Refer below code :

content = driver.find_element_by_xpath("//div[contains(@data-bt,'sub_headers')]/a")
content.text

It seems you are looping the content but you have used find_element instead of find_elements so replace it with find_elements method

loop through all the text present in same kind of element use below code:

content = driver.find_elements_by_css_selector('div._pac')
for element in content:
   print(element.text)

This is what I did! Thank you!! – song0089 Jun 11 '20 at 18:42 — song0089, Jun 11 '20 at 18:42

score 0 · Answer 3 · answered Jun 11 '20 at 09:02

To extract text from <a> tag like you mean, use this css selector div._pac > a. Please try this solution:

content = driver.find_element_by_css_selector('div._pac > a')
print(content.text)

#or use '.get_attribute'
print(content.get_attribute("innerHTML"))

If there are multiple elements with same classification on the page, you can use .find_elements_*, it will return a list of webelemet, and extract them with loop:

content = driver.find_elements_by_css_selector('div._pac > a')
for el in content:
    print(el.text)

    #or use '.get_attribute'
    print(el.get_attribute("innerHTML"))

score 0 · Answer 4 · edited Dec 05 '20 at 16:10

0

Just remember to do that extraction BEFORE closing driver!
I had that issue because I had loop after driver.close() even when my variable element was filled with data!

So loop BEFORE .close().

Example:

driver = webdriver.Chrome()
...
LOOP for(..)...
...loop does smth...
driver.close()

edited Dec 05 '20 at 16:10

codemonkey

5,572
3
17
30

answered Dec 05 '20 at 14:41

Noone

13
4

Extract text under div class using Selenium and Python

4 Answers4

Outro