Removing
tag for proper alignment while webscraping using selenium and python

Question

I want to remove the <br> html tag while web scraping the page, but replace doesn't seem to work. i'm not sure if there is another way to do it or better way to do it using selenium and python. thank you in advance.

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome("drivers/chromedriver")

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")

state_drop = driver.find_element_by_id("state")
state = Select(state_drop)
state.select_by_visible_text("New Hampshire")

driver.find_element_by_id("city").send_keys("Moultonborough")
driver.find_element_by_id("name").send_keys("Moultonborough Academy")
driver.find_element_by_class_name("forms_input_button").send_keys(Keys.RETURN)
driver.find_element_by_id("hsSelectRadio_1").click()

courses_subheading = driver.find_elements_by_tag_name("th.header")

print(courses_subheading[0].text, "     " ,courses_subheading[1].text, "     ", courses_subheading[2].text, "     ", courses_subheading[3].text, "     ", courses_subheading[4].text

I tried this:

for i in courses_subheading:
    courses_subheading.replace("<br>", " ")

but get an error: AttributeError: 'list' object has no attribute 'replace'

currently, it looks like this:

Course
Weight     Title     Notes     Max
Credits       OK
Through       Disability
Course

but i want it like this:

Course Weight     Title     Notes     Max Credits     OK     Through     Disability Course

Hi, have a look here: https://stackoverflow.com/questions/24201926/in-place-replacement-of-all-occurrences-of-an-element-in-a-list-in-python your loop to replace
was close, you just need to use the iterator not the list — RichEdwards, Aug 03 '20 at 12:15
in your loop instead of : `courses_subheading.replace("
", " ")` use: `i.replace("
", " ")` — Tasnuva, Aug 03 '20 at 12:21
@J.Doe This sounds like an [X-Y problem](http://xyproblem.info/). Instead of asking for help with your solution to the problem, edit your question and ask about the actual problem. What are you trying to do? — DebanjanB, Aug 03 '20 at 12:28
thank you for catching that, but i'm still getting an error regarding attribute ```AttributeError: 'WebElement' object has no attribute 'replace'```. also, i gave the possible solution to give context to the problem — J. Doe, Aug 03 '20 at 12:35

DebanjanB · Accepted Answer · 2020-08-03T12:59:29.700

Instead of removing the <br> you can easily avoid the <br> tags. To print the table headers, e.g. Title, Notes, etc, you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using css_selector :

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")
driver.find_element_by_css_selector("input#city").send_keys("Moultonborough")
driver.find_element_by_css_selector("input#name").send_keys("Moultonborough Academy")
driver.find_element_by_css_selector("input[value='Search']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='hsCode']"))).click()
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1 th.header")))])

Using xpath :

driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")
driver.find_element_by_xpath("//input[@id='city']").send_keys("Moultonborough")
driver.find_element_by_xpath("//input[@id='name']").send_keys("Moultonborough Academy")
driver.find_element_by_xpath("//input[@value='Search']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='hsCode']"))).click()
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='approvedCourseTable_1']//th[@class='header']")))])

Console Output:

['Course\nWeight', 'Title', 'Notes', 'Max\nCredits', 'OK\nThrough', 'Disability\nCourse']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

score 0 · Answer 2 · answered Aug 03 '20 at 15:26

To complete, if you really want to remove the br tags, you can use (I've fixed your XPath expression) :

import re
courses_subheading = driver.find_elements_by_xpath("(//tr[th[@class='header']])[1]/th")
headers = [re.sub('\s+',' ',el.text) for el in courses_subheading]
print(headers)

Output :

['Course Weight', 'Title', 'Notes', 'Max Credits', 'OK Through', 'Disability Course']

Removing tag for proper alignment while webscraping using selenium and python

2 Answers2

Removing
tag for proper alignment while webscraping using selenium and python