2

I want to be able to get a body of the specific subrequest using a selenium behind the proxy.

Now I'm using python + selenium + chromedriver. With logging I'm able to get each subrequest's headers but not body. My logging settings:

caps['loggingPrefs'] = {'performance': 'ALL', 'browser': 'ALL'}

caps['perfLoggingPrefs'] = {"enableNetwork": True, "enablePage": True, "enableTimeline": True}

I know there are several options to form a HAR with selenium:

  • Use geckodriver and har-export-trigger. I tried to make it work with the following code:

window.foo = HAR.triggerExport().then(harLog => { return(harLog); }); return window.foo;

Unfortunately, I don't see the body of the response in the returning data.

  • Use browsermob proxy. The solution seems totally fine but I didn't find the way to make browsermob proxy work behind the proxy.

So the question is: how can I get the body of the specific network response on the request made during the downloading of the webpage with selenium AND use proxies.

UPD: Actually, with har-export-trigger I get the response bodies, but not all of them: the response body I need is in json, it's MIME type is 'text/html; charset=utf-8' and it is missing from the HAR file I generate, so the solution is still missing.

UPD2: After further investigation, I realized that a response body is missing even on my desktop firefox when the har-export-trigger add-on is turned on, so this solution may be a dead-end (issue on Github)

UPD3: This bug can be seen only with the latest version of har-export-trigger. With version 0.6.0. everything works just fine.

So, for future googlers: you may use har-export-trigger v. 0.6.0. or the approach from the accepted answer.

  • You need to call JSON.stringify(harLog) converting the json object in webdriver to python caller script. You can see my implementation in the answer – Tempo810 Feb 17 '19 at 20:32

1 Answers1

1

I have actually just finished to implemented a selenium HAR script with tools you are mentioned in the question. Both HAR getting from har-export-trigger and BrowserMob are verified with Google HAR Analyser.

A class using selenium, gecko driver and har-export-trigger:

class MyWebDriver(object):
    # a inner class to implement custom wait
    class PageIsLoaded(object):
        def __call__(self, driver):
            state = driver.execute_script('return document.readyState;')
            MyWebDriver._LOGGER.debug("checking document state: " + state)
            return state == "complete"

    _FIREFOX_DRIVER = "geckodriver"
    # load HAR_EXPORT_TRIGGER extension
    _HAR_TRIGGER_EXT_PATH = os.path.abspath(
        "har_export_trigger-0.6.1-an+fx_orig.xpi")
    _PROFILE = webdriver.FirefoxProfile()
    _PROFILE.set_preference("devtools.toolbox.selectedTool", "netmonitor")
    _CAP = DesiredCapabilities().FIREFOX
    _OPTIONS = FirefoxOptions()
    # add runtime argument to run with devtools opened
    _OPTIONS.add_argument("-devtools")
    _LOGGER = my_logger.get_custom_logger(os.path.basename(__file__))

    def __init__(self, log_body=False):
        self.browser = None
        self.log_body = log_body

    # return the webdriver instance
    def get_instance(self):
        if self.browser is None:
            self.browser = webdriver.Firefox(capabilities=
                                             MyWebDriver._CAP,
                                             executable_path=
                                             MyWebDriver._FIREFOX_DRIVER,
                                             firefox_options=
                                             MyWebDriver._OPTIONS,
                                             firefox_profile=
                                             MyWebDriver._PROFILE)
            self.browser.install_addon(MyWebDriver._HAR_TRIGGER_EXT_PATH,
                                       temporary=True)
            MyWebDriver._LOGGER.info("Web Driver initialized.")
        return self.browser

    def get_har(self):
        # JSON.stringify has to be called to return as a string
        har_harvest = "myString = HAR.triggerExport().then(" \
                      "harLog => {return JSON.stringify(harLog);});" \
                      "return myString;"
        har_dict = dict()
        har_dict['log'] = json.loads(self.browser.execute_script(har_harvest))
        # remove content body
        if self.log_body is False:
            for entry in har_dict['log']['entries']:
                temp_dict = entry['response']['content']
                try:
                    temp_dict.pop("text")
                except KeyError:
                    pass
        return har_dict

    def quit(self):
        self.browser.quit()
        MyWebDriver._LOGGER.warning("Web Driver closed.")

A subclass adding BrowserMob proxy for your reference as well:

class MyWebDriverWithProxy(MyWebDriver):

    _PROXY_EXECUTABLE = os.path.join(os.getcwd(), "venv", "lib",
                                     "browsermob-proxy-2.1.4", "bin",
                                     "browsermob-proxy")

    def __init__(self, url, log_body=False):
        super().__init__(log_body=log_body)
        self.server = Server(MyWebDriverWithProxy._PROXY_EXECUTABLE)
        self.server.start()
        self.proxy = self.server.create_proxy()
        self.proxy.new_har(url,
                           options={'captureHeaders': True,
                                    'captureContent': self.log_body})
        super()._LOGGER.info("BrowserMob server started")
        super()._PROFILE.set_proxy(self.proxy.selenium_proxy())

    def get_har(self):
        return self.proxy.har

    def quit(self):
        self.browser.quit()
        self.proxy.close()
        MyWebDriver._LOGGER.info("BroswerMob server and Web Driver closed.")
Tempo810
  • 162
  • 7