0

I need to pull the APR from Yahoo Finance using Python. I was trying to follow a basic example found here , but I keep returning an empty list. This is the code that I am using:

import requests
from lxml import html

page = requests.get('http://finance.yahoo.com/rates/')
tree = html.fromstring(page.text)

interest_rates = tree.xpath('//div[@class="apr"]/text()')
print('Interest Rates: ', interest_rates)

It seems that the XPath I am using is incorrect. What would be the correct Xpath?

Edit:

I used the firebug plugin and to copy the xpath of the data I wanted. It gave me the following xpath:

/html/body/form/div[4]/table/tbody/tr[2]/td[2]/div[1]

after rerunning my code with the updated xpath I am still returning an empty list. Is there something else I need to include in my Xpath?

Edit2

import requests
from lxml import html

page = requests.get('http://finance.yahoo.com/rates/')
tree = html.fromstring(page.text)

interest_rate = tree.xpath('/html/body/form/div[4]/table/tbody/tr[2]/td[2]/div[1]/text()')
print('One interest rate is: ', interest_rate)
  • Think there isn't a class like above. – Avinash Raj Aug 12 '15 at 19:42
  • 1
    Your problem has nothing to do with this specific page, nor does it have to do with scraping. I'm guessing that if you were to print the page variable, you'd have data. I'm guessing that if you were to print the tree variable, you'd have data. Which means that the line of code that is failing is your `tree.xpath()` call. Which means your xpath is wrong. So, your question really is: Why is this xpath wrong: `//div[@class="apr"]/text()` – Lynn Crumbling Aug 12 '15 at 19:44
  • if you inspect the apr element on the web page you will see it is surrounded by the tag
    #.###%
    – Zachary Luety Aug 12 '15 at 19:45
  • 1
    Get your [xpath right in the browser](http://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome). – Peter Wood Aug 12 '15 at 19:47
  • @LynnCrumbling. Thank you, sorry I am new to this. Should I post "Why is this xpath wrong" as a separate question? – Zachary Luety Aug 12 '15 at 19:47
  • Nope, just please change your title, and edit your question to only include the relevant information. - Forget about the context of where you are having the issue (no need to mention mortgage rates, or yahoo finance.) Let's get it down to bare elements: (a) an xpath that you thought would work, but doesn't. (b) the url and element that you are targetting. – Lynn Crumbling Aug 12 '15 at 19:48
  • @PeterWood Nice - Zachary, take a look at this. It is exactly what you need. – Lynn Crumbling Aug 12 '15 at 19:49
  • @PeterWood If you answer, I think you deserve the +15 on this. – Lynn Crumbling Aug 12 '15 at 19:49
  • @PeterWood Great tool! I coppied the xpath using the tool included in Firebug and it gave me this: /html/body/form/div[4]/table/tbody/tr[2]/td[2]/div[1]. When I edited my code with this Xpath, I still got an empty list. Any Ideas? – Zachary Luety Aug 12 '15 at 19:53
  • @ZacharyLuety Do the other two variables contain data (`page` and `tree`)? – Lynn Crumbling Aug 12 '15 at 20:01
  • Update the full line of code that calls tree.xpath() so we can see exactly what you're running. If you copy and pasted that xpath exactly, you'd only going to target a single element, not all of them. And you're getting an element reference, not the text or innertext. – Lynn Crumbling Aug 12 '15 at 20:05
  • @LynnCrumbling. Both page and tree contain data (by that I mean print(page.text) and print(tree) return the expected output) – Zachary Luety Aug 12 '15 at 20:08
  • Edited question to show code with updated path. How it is set up now should only pull a single interest rate, I think. Unfortunately I'm still getting the follwoing as an output: One interest rate is: [] [Finished in 1.036s] – Zachary Luety Aug 12 '15 at 20:13
  • What do you think the `/text()` at the end is doing? – Lynn Crumbling Aug 12 '15 at 20:24
  • I was assuming that it would select the data between the
    tags
    – Zachary Luety Aug 12 '15 at 20:27
  • [Section 10.1: The “text()” function selects only text nodes, discarding any elements, comments, and other non-textual content. The return value is a list of strings.](http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/xpath.html) – Lynn Crumbling Aug 12 '15 at 20:28
  • If I'm reading that correctly, it doesn't hand you the element's innerText, which is what you really want. – Lynn Crumbling Aug 12 '15 at 20:30
  • However, if it is going to work, I suspect it would be ::text() – Lynn Crumbling Aug 12 '15 at 20:31
  • @LynnCrumbling You answer if you like, I don't have the time or expertise, I enjoy just helping along the way. – Peter Wood Aug 12 '15 at 22:22
  • Thanks @LynnCrumbling and Peter Wood for the help. I think I will be able to get the data I am looking for between the resources you both have provided. If either of you would like to post a formal "answer" with some of the suggestions in the comments, I'd be glad to accept the answer – Zachary Luety Aug 12 '15 at 22:32

0 Answers0