Can't scrape three fields from a table with complicated layout

Question

I've created a script in python together with selenium to parse three fields franking credit,gross divident and further information from a table available in a website. The last two fields are revealed only when the browser is made to click on a circular yellow button having plus sign within it.

However, when the buttons are clicked, they turn into red which indicates that the information got displayed.

My script can click on all the buttons but it can't scrape the three fields from that table.

I've attached an image to show you how it really looks like.

I know if I send a post http requests with concerning payload to this https://www.sharedividends.com.au/wp-content/custom/ajaxfile.php?code=MLT, I can get all the tabular fields in json but that is not how I wanna solve this.

Website link

I've tried with:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.sharedividends.com.au/mlt-dividend-history/"

driver = webdriver.Chrome()

driver.get(url)

table = driver.find_element_by_css_selector("#divTable")
driver.execute_script("arguments[0].scrollIntoView();",table)

for items in driver.find_elements_by_css_selector("td.sorting_1"):
    driver.execute_script("arguments[0].scrollIntoView();",items)
    items.click()

for elems in driver.find_elements_by_css_selector("#divTable tbody tr"):
    franking_credit = elems.find_elements_by_css_selector("td")[5].text
    gross_divident = elems.find_elements_by_css_selector("td")[6].text
    further_info = elems.find_elements_by_css_selector("td")[7].text
    print(franking_credit,gross_divident,further_info)

driver.quit()

Whe I run the above script it throws this error IndexError: list index out of range pointing at franking_credit = this line.

This is how that table looks like. I've marked the three fields in that table within the image below which I'm interested in.

Image link

How can I parse the three fields from that table?

Noah64 · Answer 1 · 2019-08-09T09:57:38.720

1

This should do the trick!

from selenium import webdriver

driver = webdriver.Chrome('chromedriver/chromedriver.exe')

driver.get("https://www.sharedividends.com.au/mlt-dividend-history/")

for button in driver.find_elements_by_class_name("sorting_1"):
    button.click()

# Returns first part of the info
for item in driver.find_elements_by_xpath("//tr[@role='row']/td"):
    print(item.text)

# Returns second part of info
for a in driver.find_elements_by_xpath("//ul[@class='dtr-details']/li"):
        print(a.text)

Outputs; this

edited Aug 09 '19 at 09:57

answered Aug 09 '19 at 09:22

Noah64

69
2

I could not understand what you meant by that. Could you show me a little bit more of that approach? – MITHU Aug 09 '19 at 09:31
Selenium has a `driver.find_elements_by_xpath` method. Use it like this - `driver.find_elements_by_xpath("//tr[@role='row']/td")`. This will return a list, just search and/or iterate through that list to find the info you need. – Noah64 Aug 09 '19 at 09:37
@MITHU I've edited my answer, should help you a lot now – Noah64 Aug 09 '19 at 09:57
You found everything accurately but to keep my current implementation intact, I needed `.get_attribute('textContent')` this very command . – MITHU Aug 09 '19 at 10:57
Yeah `get_attribute("textContent")` can replace the `.text`. I just tested it. Hope this is the answer that worked for you! – Noah64 Aug 09 '19 at 11:01

KunduK · Accepted Answer · 2019-08-09T10:32:32.740

You are getting following error because when run automation scripts it showing 20 rows with some other attribute instead of 10 rows.Try the following code.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.sharedividends.com.au/mlt-dividend-history/"

driver = webdriver.Chrome()

driver.get(url)

table = driver.find_element_by_css_selector("#divTable")
driver.execute_script("arguments[0].scrollIntoView();",table)

for items in driver.find_elements_by_css_selector("td.sorting_1"):
    driver.execute_script("arguments[0].scrollIntoView();",items)
    items.click()

for elems in driver.find_elements_by_css_selector("#divTable tbody tr[role='row']"):
    franking_credit = elems.find_elements_by_css_selector("td")[5].text
    gross_divident = elems.find_elements_by_css_selector("td")[6].get_attribute('textContent')
    further_info = elems.find_elements_by_css_selector("td")[7].get_attribute('textContent')
    print(franking_credit, gross_divident,further_info)

Output on console:

$ 0.0446 $ 0.1486 10.4C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0107 $ 0.0357 2.5C FRANKED@30%; SP ECIAL; DRP SUSP

$ 0.0386 $ 0.1286 9C FRANKED @ 30%; DR P NIL DISCOUNT

$ 0.0437 $ 0.1457 10.2C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0377 $ 0.1257 8.8C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0429 $ 0.1429 10C FRANKED @ 30%; D RP NIL DISCOUNT

$ 0.0373 $ 0.1243 8.7C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0424 $ 0.1414 9.9C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0373 $ 0.1243 8.7C FRANKED @ 30%; DRP

$ 0.0441 $ 0.1471 10.3C FR@30%;0.4C SP ECIAL;DRP;NIL DIS

First `for loop` which is supposed to reveal the content is completely redundant as the expected elements are already there. — SIM, Aug 09 '19 at 10:24
@SIM: When you run automation code you must see more redundant rows with some other class name that is why OP facing indexing issue. — KunduK, Aug 09 '19 at 10:31

score 1 · Answer 3 · answered Aug 09 '19 at 11:09

To extract the data from the three fields Franking Credit, Gross Divident and Further Information you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategies:

Code Block:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_options = webdriver.ChromeOptions() 
chrome_options.add_argument("start-maximized")
chrome_options.add_argument('disable-infobars')
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.sharedividends.com.au/mlt-dividend-history/")
driver.execute_script("arguments[0].scrollIntoView();", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#divTable"))))
for elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@aria-describedby='divTable_info']//tbody//tr/td[@class='sorting_1']"))):
    elem.click()
all_fc = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@aria-describedby='divTable_info']//tbody//tr//td[position()=6]")))]
all_gd = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@aria-describedby='divTable_info']//tbody//tr//td[position()=7]")))]
all_fi = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@aria-describedby='divTable_info']//tbody//tr[@class='child']//li//span[@class='dtr-data']")))]
for x,y,z in zip(all_fc, all_gd, all_fi):
    print(x,y,z)

Console Output:

$ 0.0446 $ 0.1486 10.4C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0107 $ 0.0357 2.5C FRANKED@30%; SP ECIAL; DRP SUSP

$ 0.0386 $ 0.1286 9C FRANKED @ 30%; DR P NIL DISCOUNT

$ 0.0437 $ 0.1457 10.2C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0377 $ 0.1257 8.8C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0429 $ 0.1429 10C FRANKED @ 30%; D RP NIL DISCOUNT

$ 0.0373 $ 0.1243 8.7C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0424 $ 0.1414 9.9C FRANKED @ 30%; DRP NIL DISCOUNT

$ 0.0373 $ 0.1243 8.7C FRANKED @ 30%; DRP

$ 0.0441 $ 0.1471 10.3C FR@30%;0.4C SP ECIAL;DRP;NIL DIS

Can't scrape three fields from a table with complicated layout

3 Answers3

Linked