How to extract the href attribute of the first search result using Selenium and Python

Question

I have a list of books on my excel, for each, i want to fill a column with summary. For this, i am going to goodreads.com, searching "harry potter" opening up the first result that comes, and then copy pasting the summary text. However, having trouble getting the 1st search result's link. here's my code. Link i referred to: Python Selenium - get href value

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver=webdriver.Chrome()
driver.get('https://goodreads.com')


loginbox=driver.find_element_by_xpath('//*[@id="userSignInFormEmail"]')
loginbox.send_keys('shivam01anand@gmail.com')
passwordbox=driver.find_element_by_xpath('//*[@id="user_password"]')
passwordbox.send_keys('shivam03')
loginButton=driver.find_element_by_xpath('//*[@id="sign_in"]/div[3]/input[1]')
loginButton.click()

searchbox=driver.find_element_by_xpath('/html/body/div[2]/div/header/div[2]/div/div[2]/form/input[1]')
searchbox.send_keys('harry potter')

searchButton=driver.find_element_by_xpath('/html/body/div[2]/div/header/div[2]/div/div[2]/form/button')
searchButton.click()

elem=driver.find_element_by_css_selector("bookTitle").get_attribute("href")
print(elem)
#elem = driver.find_element_by_css_selector("bookTitle [href]")

Error: NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/header/div[2]/div/div[2]/form/input[1]"}
  (Session info: chrome=83.0.4103.116)

This error only comes when I write the elem line, which is weird because the error is of a previous line. Utterly confused.

You share credential here? After hit submit login, may you need wait. — frianH, Jul 09 '20 at 13:52
No, the harry potter line works when I don't do the last line — Shivam Anand, Jul 09 '20 at 15:19
@ShivamAnand Please don't change the question based on which you have received well researched answers. Once you receive canonical answers changing the question can make all the existing answers invalid and may not be useful to future readers. If your requirement have changed feel free to raise a new question. StackOverflow contributors will be happy to help you out. For the time being I have reverted back the question to it's initial state. — DebanjanB, Jul 10 '20 at 13:32

score 0 · Accepted Answer · answered Jul 09 '20 at 17:10

To print the value of the href attribute of the first search result you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

driver.get("https://goodreads.com")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='userSignInFormEmail']"))).send_keys("shivam01anand@gmail.com")
driver.find_element_by_xpath("//input[@id='user_password']").send_keys("shivam03")
driver.find_element_by_xpath("//input[@value='Sign in']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("harry potter")
driver.find_element_by_xpath("//button[@aria-label='Search']").click()
# extracting the _href_ attribute of the first search result using CSS_SELECTOR
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.tableList > tbody > tr td a.bookTitle"))).get_attribute("href"))

Using XPATH:

driver.get("https://goodreads.com")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='userSignInFormEmail']"))).send_keys("shivam01anand@gmail.com")
driver.find_element_by_xpath("//input[@id='user_password']").send_keys("shivam03")
driver.find_element_by_xpath("//input[@value='Sign in']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("harry potter")
driver.find_element_by_xpath("//button[@aria-label='Search']").click()
# extracting the _href_ attribute of the first search result using XPATH
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tableList']/tbody/tr//td//a[@class='bookTitle']"))).get_attribute("href"))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Console Output:

https://www.goodreads.com/book/show/3.Harry_Potter_and_the_Sorcerer_s_Stone?from_search=true&from_srp=true&qid=3nIjRXwsfG&rank=1

References

You can find a couple of relevant discussions on NoSuchElementException in:

Thank you Debanjan, I have another error which comes up post the link being opened. (saved under EDIT in the question) — Shivam Anand, Jul 10 '20 at 13:26
@ShivamAnand Please raise a new question for your new requirement. Stackoverflow contributors will be happy to help you out. — DebanjanB, Jul 10 '20 at 13:31
@ShivamAnand Glad to be able to help you. [Upvote](https://stackoverflow.com/help/why-vote) the answer if this/any answer is/was helpful to you for the benefit of the future readers. — DebanjanB, Jul 10 '20 at 16:26
done! https://stackoverflow.com/questions/62838586/extracting-text-from-a-website-using-selenium — Shivam Anand, Jul 10 '20 at 16:31

How to extract the href attribute of the first search result using Selenium and Python

1 Answers1

References