0

I am looking at this page. I am trying to use Selenium and chromdriver to scrape this data (shown by the red marker):

enter image description here

Here is my Python code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep

chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("disable-infobars")
driver = webdriver.Chrome(executable_path="/ABC/chromedriver", chrome_options=chrome_options)

driver.get("https://finance.yahoo.com/quote/IBM")
sleep(10)
estimated = driver.find_element_by_class_name("IbBox Ta(start) C($tertiaryColor)")

But the code does not get the Est. Return and after a long wait it returns this error message:

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

What am I doing wrong and what is the best and fastest way to get the Est Return value from the page?

UPDATE: Here is what I see if I use inspect element in Chrome:

enter image description here

TJ1
  • 5,601
  • 17
  • 61
  • 101

3 Answers3

1

Header plays an important role to fetch the value you are after, so make sure you have one. Given that this is how you get the desired content.

import requests
from bs4 import BeautifulSoup

link = "https://finance.yahoo.com/quote/IBM"

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}

r = requests.get(link,headers=headers)
soup = BeautifulSoup(r.text,"lxml")
est_return = soup.select_one("[class='Mb\(8px\)']").get_text()
print(est_return)
SIM
  • 20,216
  • 3
  • 27
  • 78
  • Thanks, it works nicely. How did you find the class? How do you know it should be `Mb\(8px\)`? – TJ1 Mar 21 '20 at 23:53
  • Class name containing braces should be escaped. The backslash ( \ ) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. – SIM Mar 21 '20 at 23:57
  • Can you please elaborate, I am not too familiar with this. – TJ1 Mar 21 '20 at 23:59
  • For example how can you extract `Near Fair Value` ? – TJ1 Mar 22 '20 at 00:00
  • 1
    Try this `soup.select_one("[class='Mb\(8px\)']").find_previous_sibling().get_text()` – SIM Mar 22 '20 at 00:03
  • Thanks again, that works as well. Still, it is unclear to me how did you find out that we should use `class='Mb\(8px\)'`. If I right-click on chrome and look at the source page it is not there. Can you please explain how and where do you find the class name? Please explain this so I can accept your answer as a complete answer. Thanks. – TJ1 Mar 22 '20 at 03:38
  • Why you don't find it in that page is a real mystery. However, check out this ***[image](https://filebin.net/1w8w4h8uwpll6q1s)*** – SIM Mar 22 '20 at 04:29
  • Thanks for the image. Ok I see that image now. I will update my question with that image. My question is that the Est. Return is inside a different `div`, and that `div` has this class: `IbBox Ta(start) C($tertiaryColor)`. Why did you use the class name for the outer `div`? – TJ1 Mar 22 '20 at 15:34
  • To get that specific output from that page you need to use any unique selector. Given that there is hardly any within `IbBox Ta(start) C($tertiaryColor)`, so I had to choose the parent which surely gives you the required result. Hope you got it now. Thanks. – SIM Mar 22 '20 at 15:39
0

Can you try with XPath instead, it should look like this:

estimated = driver.find_element_by_xpath("*//div[@class='IbBox Ta(start) C($tertiaryColor)']").text()

Let me know how does it go! :D

EnriqueBet
  • 1,091
  • 2
  • 10
  • 20
0

This error message...

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

...implies that the Locator Strategy you have used wasn't a valid expression.


To scrape the text -6% Est. Return you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategy:

  • Using XPATH:

    driver.get('https://finance.yahoo.com/quote/IBM')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Near Fair Value']//following::div[1]/div"))).text)
    
  • Console Output:

    -6% Est. Return
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
DebanjanB
  • 118,661
  • 30
  • 168
  • 217