0

Trying to help someone who works for a nonprofit. Currently trying to pull info from the STL County Boards/Commissions website(https://boards.stlouisco.com/).

Having trouble for a few reasons:

Was going to attempt to use BeautifulSoup, but the actual data isn't even shown until you choose a Board/Commission from a dropdown bar above, so I have switched to Selenium, which I am new at.

Is this task possible? When I look at the html code for the site, I see that the info isn't stored in the page, but pulled from another location and just displayed on the site based on the option chosen from the dropdown menu.

function ShowMemberList(selectedBoard) {
        ClearMeetingsAndMembers();
        var htmlString = "";
        var boardsList = [{"id":407,"name":"Aging Ahead","isActive":true,"description":"... ...1.","totalSeats":14}];
        var totalMembers = boardsList[$("select[name='BoardsList'] option:selected").index() - 1].totalSeats;
        $.get("/api/boards/" + selectedBoard + "/members", function (data) {
            if (data.length > 0) {
                htmlString += "<table id=\"MemberTable\" class=\"table table-hover\">";
                htmlString += "<thead><th>Member Name</th><th>Title</th><th>Position</th><th>Expiration Date</th></thead><tbody>";
                for (var i = 0; i < totalMembers; i++) {
                    if (i < data.length) {
                        htmlString += "<tr><td>" + FormatString(data[i].firstName) + " " + FormatString(data[i].lastName) + "</td><td>" + FormatString(data[i].title) + "</td><td>" + FormatString(data[i].position) + "</td><td>" + FormatString(data[i].expirationDate) + "</td></tr>";
                    } else {
                        htmlString += "<tr><td colspan=\"4\">---Vacant Seat---</td></tr>" 
                    }
                }
                htmlString += "</tbody></table>";
            } else {
                htmlString = "<span id=\"MemberTable\">There was no data found for this board.</span>";
            }
            $("#Results").append(htmlString);
        });
    }

So far, I have this (not a lot), which goes to the page and selects every board from the list:

driver = webdriver.Chrome()
driver.get("https://boards.stlouisco.com/")
select = Select(wait(driver, 10).until(EC.presence_of_element_located((By.ID, 'BoardsList'))))
options = select.options

for board in options:
    select.select_by_visible_text(board.text)

From here I would like to be able to scrape the info from the MemberTable but I don't know how to move forward/if it is something in the scope of my abilities, or even if it is something possible with Selenium.

I've tried using find_by a few different elements to click on the members table but am met with errors. I have also tried calling for the memberstable after my select, but it is not able to find that element. Any tips/pointers/advice is appreciated!

DebanjanB
  • 118,661
  • 30
  • 168
  • 217

2 Answers2

0

To choose each of the Board / Commission from the Dropdown and scrape the page you have to induce WebDriverWait for the element_to_be_clickable() and you can use the following Locator Strategies:

Code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://boards.stlouisco.com/")
select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'BoardsList'))))
for option in select.options:
    option.click()
    print("Scrapping :"+option.text)

Console Output:

Scrapping :---Choose a Board---
Scrapping :Aging Ahead
Scrapping :Aging Ahead Advisory Council
Scrapping :Air Pollution & Noise Control Appeal Board
Scrapping :Animal Care & Control Advisory Board
Scrapping :Bi-State Development Agency (Metro)
Scrapping :Board Of Examiners For Mechanical Licensing
Scrapping :Board of Freeholders
Scrapping :Boundary Commission
Scrapping :Building Code Review Committee
Scrapping :Building Commission & Board Of Building Appeals
Scrapping :Business Advisory Council
Scrapping :Center for Educational Media
Scrapping :Civil Service Commission
Scrapping :Commission On Disabilities
Scrapping :County Health Advisory Board
Scrapping :Domestic And Family Violence Council
Scrapping :East-West Gateway Council of Governments Board of Directors
Scrapping :Economic Development Collaborative Advisory Board
Scrapping :Economic Rescue Team
Scrapping :Electrical Code Review Committee
Scrapping :Electrical Examiners, Board Of
Scrapping :Emergency Communications System Commission
Scrapping :Equalization, Board Of
Scrapping :Fire Standards Commission
Scrapping :Friends of the Kathy J. Weinman Shelter for Battered Women, Inc.
Scrapping :Fund Investment Advisory Committee
Scrapping :Historic Building Commission
Scrapping :Housing Authority
Scrapping :Housing Resources Commission
Scrapping :Human Relations Commission
Scrapping :Industrial Development Authority Board
Scrapping :Justice Services Advisory Board
Scrapping :Lambert Airport Eastern Perimeter Joint Development Commission
Scrapping :Land Clearance For Redevelopment Authority
Scrapping :Lemay Community Improvement District
Scrapping :Library Board
Scrapping :Local Emergency Planning Committee
Scrapping :Mechanical Code Review Committee
Scrapping :Metropolitan Park And Recreation District Board Of Directors (Great Rivers Greenway)
Scrapping :Metropolitan St. Louis Sewer District
Scrapping :Metropolitan Taxicab Commission
Scrapping :Metropolitan Zoological Park and Museum District Board
Scrapping :Municipal Court Judges
Scrapping :Older Adult Commission
Scrapping :Parks And Recreation Advisory Board
Scrapping :Planning Commission
Scrapping :Plumbing Code Review Committee
Scrapping :Plumbing Examiners, Board Of
Scrapping :Police Commissioners, Board Of
Scrapping :Port Authority Board Of Commissioners
Scrapping :Private Security Advisory Committee
Scrapping :Productive Living Board
Scrapping :Public Transportation Commission of St. Louis County
Scrapping :Regional Arts Commission
Scrapping :Regional Convention & Sports Complex Authority
Scrapping :Regional Convention & Visitors Commission
Scrapping :REJIS Commission
Scrapping :Restaurant Commission
Scrapping :Retirement Board Of Trustees
Scrapping :St. Louis Airport Commission
Scrapping :St. Louis County Children's Service Fund Board
Scrapping :St. Louis County Clean Energy Development Board (PACE)
Scrapping :St. Louis County Workforce Development Board
Scrapping :St. Louis Economic Development Partnership
Scrapping :St. Louis Regional Health Commission
Scrapping :St. Louis-Jefferson Solid Waste Management District
Scrapping :Tax Increment Financing Commission of St. Louis County
Scrapping :Transportation Board
Scrapping :Waste Management Commission
Scrapping :World Trade Center - St. Louis
Scrapping :Zoning Adjustment,  Board of
Scrapping :Zoo-Museum District - Art Museum Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Botanical Garden Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Missouri History Museum Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - St. Louis Science Center Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Zoological Park Subdistrict Board of Commissioners

References

You can find a couple of relevant discussions in:

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
0

You can use this script to save all members from all boards to csv:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://boards.stlouisco.com/'
members_url = 'https://boards.stlouisco.com/api/boards/{}/members'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for o in soup.select('#BoardsList option[value]'):
    print(o['value'], o.text)
    data = requests.get(members_url.format(o['value'])).json()
    for d in data:
        all_data.append(dict(board=o.text, **d))

df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv')

Prints:

                                                 board  boardMemberId  memberId boardName  ...   lastName                                  title                                           position expirationDate
0                                          Aging Ahead          39003     27007      None  ...   Anderson                                   None               ST. LOUIS COUNTY EXECUTIVE APPOINTEE      10/1/2020
1                                          Aging Ahead          38963     27797      None  ...     Bauers                                   None  St. Charles County Community Action Agency App...           None
2                                          Aging Ahead          39004     27815      None  ...  Berkowitz                                   None               ST. LOUIS COUNTY EXECUTIVE APPOINTEE      10/1/2020
3                                          Aging Ahead          38964     27798      None  ...     Biehle                                   None  Jefferson County Community Action Corp. Appointee           None
4                                          Aging Ahead          38581     27597      None  ...     Bowers                                   None               Franklin County Commission Appointee           None
..                                                 ...            ...       ...       ...  ...        ...                                    ...                                                ...            ...
725  Zoo-Museum District - Zoological Park Subdistr...          38863     26745      None  ...       Seat               (Robert R. Hermann, Jr.)                                   St. Louis County     12/31/2019
726  Zoo-Museum District - Zoological Park Subdistr...          38864     26745      None  ...       Seat                        (Winthrop Reed)                                   St. Louis County     12/31/2016
727  Zoo-Museum District - Zoological Park Subdistr...          38669     26745      None  ...       Seat                      (Lawrence Thomas)                                   St. Louis County     12/31/2018
728  Zoo-Museum District - Zoological Park Subdistr...          38670     26745      None  ...       Seat  (Peggy Ritter ) Advisory Commissioner                        Non-Voting St. Louis County     12/31/2019
729  Zoo-Museum District - Zoological Park Subdistr...          38394     27512      None  ...     Wilson                  Advisory Commissioner                       Non-Voting City of St. Louis           None

[730 rows x 9 columns]

And saves data.csv with all boards/members (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 81,807
  • 10
  • 31
  • 56