3

Goal: I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.

Problem: I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided after code snippets).

BrowserMob-Proxy Explanation: As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody (lead designer of BMP) recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.

Software Specs:

  • Operating System: Windows 7 (64x) -- running in VirtualBox
  • Browser: FireFox (32.0.2)
  • Script Language: Python (2.7.8)
  • Automated Web Browser: Selenium (2.43.0) -- installed via PIP
  • BrowserMob-Proxy: 0.6.0 AND 2.0-beta-8 -- see explanation below

Selenium Script (this script works):

"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver

driver = webdriver.Firefox()       # Opens FireFox browser.
driver.get('https://google.com/')  # Gets google.com and loads page in browser.

driver.quit()                      # Closes Firefox browser

This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.

Script ALPHA with BMP (does not work):

"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()

from selenium import webdriver
driver = webdriver.Firefox()           # Opens FireFox browser.

proxy.new_har("ALPHA_HAR")             # Creates a new HAR
driver.get("https://www.google.com/")  # Gets google.com and loads page in browser.
proxy.har                              # Returns a HAR JSON blob

server.stop()

This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.

Script BETA with BMP (does not work):

"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()    
proxy = server.create_proxy()

from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

proxy.new_har("BETA_HAR")             # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har                             # Returns a HAR JSON blob

server.stop()

This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the above code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).

Civic_Matt
  • 31
  • 1
  • 4

5 Answers5

2

I use phantomJS, here is an example of how to use it with python:

import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'

s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()
user2415430
  • 71
  • 1
  • 3
2

Try this:

from browsermobproxy import Server
from selenium import webdriver
import json

server = Server("path/to/browsermob-proxy")
server.start()
proxy = server.create_proxy()
profile = webdriver.FirefoxProfile()
profile.set_proxy(self.proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("http://stackoverflow.com", options={'captureHeaders': True})
driver.get("http://stackoverflow.com")    
result = json.dumps(proxy.har, ensure_ascii=False)
print result
proxy.stop()    
driver.quit()
Paras Dahal
  • 505
  • 6
  • 12
  • I am getting below error. raceback (most recent call last): File "e:sample.py", line 10, in profile.set_proxy(self.proxy.selenium_proxy()) NameError: name 'self' is not defined – NaveenKumar Namachivayam May 10 '16 at 06:08
1

What worked for me was to downgrade java version to java11. I used jenv to install and manage multiple java versions.

0

When you do:

proxy.har

You need to parse that response, proxy.har is a JSON object, so if you need to generate a file, you need to do this:

myFile = open('BETA_HAR.har','w')
myFile.write( str(proxy.har) )
myFile.close()

Then you will find your .har

José Castro
  • 3,626
  • 2
  • 21
  • 26
0

Finding your HAR file

Inherently, the HAR object generated by the proxy is just that: an object in memory. The reason you can't find it on your hard drive is because it's not being saved there unless you write it there yourself. This is a pretty simple operation, as the HAR is just JSON.

with open("harfile", "w") as harfile:
    harfile.write(json.dumps(proxy.har))

Why does ALPHA not work?

When you start dumping your HAR file, you'll find that your HAR file is empty with the ALPHA script. This is because you are not adding the proxy to the settings for Firefox, meaning that it will just connect directly bypassing your proxy.

What about BETA?

This code is written correctly as far as connecting to the proxy, although personally I prefer adding the proxy to the capabilities and passing those through. The code for that is:

cap = webdriver.DesiredCapabilities.FIREFOX.copy()
proxy.add_to_capabilities(cap)
driver = webdriver.Firefox(capabilities=cap)

I would guess that your issue lies with the proxy itself. Check the bmp.log and/or server.log files in the location of the python script and see what it is saying if something is going wrong.

Another alternative is that selenium is reporting back that the webpage has loaded before it actually has finished getting all of the elements, and as such your proxy is shutting down too early. Try making the script wait a bit longer before shutting down the proxy, or running it interactively through the interpreter.