1

Hi I recently took up a project about bitcoin analysis and need to download financial data from Yahoo! Finance via Python. I tried fix_yahoo_finance and pandas datareader but there seems to be a bug on the website when downloading files. It always miss some days. So I decided to use beautiful soup, the code is as follow:

import requests
import time
import pandas as pd
from bs4 import BeautifulSoup

def time_convert(dt):
    time.strptime(dt,'%Y-%m-%d %H:%M:%S')
    s = time.mktime(time.strptime(dt,'%Y-%m-%d %H:%M:%S'))
    return str(int(s))

s = requests.Session()
start = time_convert("2016-02-15 00:00:00")
end   = time_convert("2018-02-15 00:00:00")

r = s.get("https://uk.finance.yahoo.com/quote/BTC-USD/history?period1="+start+"&period2="+end+"&interval=1d&filter=history&frequency=1d"

soup = BeautifulSoup(r.text, 'lxml')
tables = soup.select('table')

df_list = []
for table in tables:
    df_list.append(pd.concat(pd.read_html(table.prettify())))
    df = pd.concat(df_list)
    df.to_excel("E:\PythonData\price_"+'.xlsx')

It works but the data is not complete because the website loads the data when your mouse scroll s down to the end of the page, but codes doesn't do that. How can I fix this?

Prabhakar
  • 1,046
  • 2
  • 13
  • 27
Rookie0007
  • 13
  • 1
  • 5

2 Answers2

2

Yahoo used to have a financial api, they've terminated it since, there is a workaround though.

I've used this with success before, you might want to take a look at it.

David
  • 1,276
  • 13
  • 23
  • Thank you first. I downloaded this package but don't understand how to call its functions. Can you provide some example? – Rookie0007 Apr 14 '18 at 19:09
  • run the setup script like for any other python library, import it and use the load_yahoo_quote function, ticker is some string, probbably 'BTC-USD' in your case, begindate and enddate are self explanatory. – David Apr 14 '18 at 19:16
  • It worked, the function is called "load_yahoo_quote". But the data downloaded are a series of lists, can you tell me how to transfer it into dataframe? Thanks a lot! – Rookie0007 Apr 14 '18 at 20:00
  • idk what you want, but google is your best friend here, if you just want an excel file, csv works good enough https://stackoverflow.com/questions/14037540/writing-a-python-list-of-lists-to-a-csv-file – David Apr 14 '18 at 20:02
  • actually its a list of strings, and each string is a row of all the data including prices and dates separated by ",". I wonder if there is a way to transfer the data into dataframe.... – Rookie0007 Apr 14 '18 at 22:35
  • You can probbably just write those as lines into a file (something.csv) and then you can easily import it into pretty much any database program (even excel) – David Apr 14 '18 at 22:38
0

Have you tries using Yahoo Financials? It's really well built and doesn;t scrap the webpages. It hashes out the data you want from the ["context"]["dispatcher"]["stores"] object. It's pretty fast and really well built.

$ pip install yahoofinancials

Usage Examples:

from yahoofinancials import YahooFinancials

tech_stocks = ['AAPL', 'MSFT', 'INTC']
bank_stocks = ['WFC', 'BAC', 'C']

yahoo_financials_tech = YahooFinancials(tech_stocks)
yahoo_financials_banks = YahooFinancials(bank_stocks)

tech_cash_flow_data_an = yahoo_financials_tech.get_financial_stmts('annual', 'cash')
bank_cash_flow_data_an = yahoo_financials_banks.get_financial_stmts('annual', 'cash')

banks_net_ebit = yahoo_financials_banks.get_ebit()
tech_stock_price_data = tech_cash_flow_data.get_stock_price_data()
daily_bank_stock_prices = yahoo_financials_banks.get_historical_stock_data('2008-09-15', '2017-09-15', 'daily')

Output Example:

yahoo_financials = YahooFinancials('WFC')
print(yahoo_financials.get_historical_stock_data("2017-09-10", "2017-10-10", "monthly"))

returns

{
    "WFC": {
        "prices": [
            {
                "volume": 260271600,
                "formatted_date": "2017-09-30",
                "high": 55.77000045776367,
                "adjclose": 54.91999816894531,
                "low": 52.84000015258789,
                "date": 1506830400,
                "close": 54.91999816894531,
                "open": 55.15999984741211
            }
        ],
        "eventsData": [],
        "firstTradeDate": {
            "date": 76233600,
            "formatted_date": "1972-06-01"
        },
        "isPending": false,
        "timeZone": {
            "gmtOffset": -14400
        },
        "id": "1mo15050196001507611600"
    }
}
alt777
  • 151
  • 1
  • 4