0

I'm currently running this code:

    import urllib
    from bs4 import BeautifulSoup

    htmltext = urllib.urlopen("http://www.fifacoin.com/")
    html = htmltext.read()

    soup = BeautifulSoup(html)
    for item in soup.find_all('tr', {'data-price': True}):
        print(item['data-price'])

When I run this code I don't get any output at all, when I know there are html tags with these search parameters in them on that particular website. I'm probably making an obvious mistake here, i'm new to Python and BeautifulSoup.

johannchopin
  • 7,327
  • 6
  • 22
  • 62
Ando
  • 85
  • 1
  • 6
  • http://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python try to find more solution here. – zhujs Dec 20 '14 at 03:22

1 Answers1

2

The problem is that the price list table is loaded through javascript, and urllib does not include any javascript engine as far as I know. So all of the javascript in that page, which is executed in a normal browser, is not executed in the page fetched by urllib. The only way of doing this is emulating a real browser. Solutions that come to mind are PhantomJS and Node.js.

I recently did a similar thing with nodejs (although I am a python fan as well) and was presently surprised. I did it a little differently, but this page seems to explain quite well what you would want to do: http://liamkaufman.com/blog/2012/03/08/scraping-web-pages-with-jquery-nodejs-and-jsdom/

Dolf Andringa
  • 1,677
  • 1
  • 16
  • 28