5

Beautifulsoup can often be used to (1) store the contents of a page in a variable and (2) parse elements in a webpage.

However Beautifulsoup on it's own cannot open - password protected HTTP error 403 pages. So I used requests for this task.

Now I am wondering does the Requests library have the ability to Force the javascript on a page to load?

I am using python2.7

Does requests have the ability to requests.open(some url).forceJavascriptLoad

yoshiserry
  • 14,991
  • 24
  • 67
  • 96

1 Answers1

3

No. Requests doesn't have the ability to execute javascript in any way. You need a so-called "headless" web browser to do what you want. Here is a list of some of them. As an advice I recommend you to try the PhantomJS, although it is not written in Python, it has several advantages over the others:

  1. It is easy to setup and use
  2. Actively developed and not abandoned like a lot of other headless browsers
  3. Has really good JavaScript support
  4. Is fast
  5. Provides precompiled binaries in case you have problems with compiling stuff

I tried a lot of headless browsers by myself and I was only happy with PhantomJS. If you still want to try the Python-based headless browser you can give a Ghost a try.

Max Tepkeev
  • 2,306
  • 1
  • 12
  • 12
  • thanks for your great list. I haven't been able to install pyqt4 which is in c/python27/lib/site-packages and my ipython is in c/python27/scripts so sounds like i should look at phantomjs. Been having a lot of trouble installing some libraries and getting them to be recognised by ipython. I wish there was a --> show all programs aka installed python libraries. – yoshiserry May 19 '14 at 06:30
  • Give PhantomJS a try. It provides precompiled binaries so you don't need to compile anything. – Max Tepkeev May 19 '14 at 06:34
  • oooo sounds exciting does this mean I can install it as easily as the chris gohlke windows binaries? – yoshiserry May 19 '14 at 09:06
  • Yes, you just download the zip archive, extract it, and you have phantomjs.exe which is ready to use, all the dependencies are already build-in into the executable. Also, if this helped you, please consider marking an answer as correct. – Max Tepkeev May 19 '14 at 10:09
  • can you call phantomjs and casperjs from python? – yoshiserry May 19 '14 at 12:48
  • can you call phantomjs and casperjs from python? - beacuse i've already written some logic for authenticating to a intranet via a webpage and a form, entering a username and password etc. so Can I use what I already have for the authentication via mechanize for python and then can I use casper.js and phantom.js to do the clicking around and extracting data? at the moment I am on windows using python via ipython. If I can't call these (casper and phantomjs) from python I might need a javascript editor. But then I would need to write the whole scraping app in javascript? – yoshiserry May 19 '14 at 13:01
  • Yes you can, phantomjs is just a cmd utility, you can run it from python, using for example subprocess module, receive a result from phantomjs and then do whatever you want with it. See http://stackoverflow.com/questions/89228/calling-an-external-command-in-python about how to call external commands in Python – Max Tepkeev May 19 '14 at 15:24