5

Below I have setup a script which simply executes a search on a website. The goal is to capture JSON data utilizing Selenium from an event that is fired from an intermediate script, namely the POST request to "https://www.botoxcosmetic.com/sc/api/findclinic/FindSpecialists" as seen in the included image, but without directly sending a request to that URL using Selenium or the requests library. What is the best way to do this, preferably in Python but open to any language?

from selenium import webdriver
base_url = 'https://www.botoxcosmetic.com/women/find-a-botox-cosmetic-specialist'
driver = webdriver.Chrome()
driver.find_element_by_class_name('normalZip').send_keys('10022')
driver.find_element_by_class_name('normalSearch').click()

enter image description here

ikemblem
  • 323
  • 2
  • 15

1 Answers1

2

You will need to use a proxy, my suggestion would be to use the BrowserMob Proxy.

First of all install the BrowserMob Proxy libraries:

pip install browsermob-proxy

You will then need to download the latest release (2.1.4 at the time of writing this), extract it and then place it in your project directory. This is going to be a location you need to pass in when setting up the BrowserMob Proxy server (See below where Server("browsermob-proxy-2.1.4/bin/browsermob-proxy") is defined)

I've then updated your script to the following:

import json

from browsermobproxy import Server
from haralyzer import HarParser
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

base_url = 'https://www.botoxcosmetic.com'
server = Server("browsermob-proxy-2.1.4/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))

driver = webdriver.Chrome(options=chrome_options)
driver.get("{0}/women/find-a-botox-cosmetic-specialist".format(base_url))

proxy.new_har(options={"captureContent": "true"})
driver.find_element_by_class_name('normalZip').send_keys('10022')
driver.find_element_by_class_name('normalSearch').click()

WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#specialist-results > div")))

har_parser = HarParser(proxy.har)
for entry in har_parser.har_data["entries"]:
    if entry["request"]["url"] == "{0}/sc/api/findclinic/FindSpecialists".format(base_url):
        result = json.loads(entry["response"]["content"]["text"])

driver.quit()
server.stop()

This will start up a BrowserMob Proxy instance and capture the response for the FindSpecialists network call and store it as JSON in the result variable.

You can then use that to do whatever you want to do with the response. Apologies if the code is not as clean as you would expect, I'm not a native Pythonista.

Useful references are:

Ardesco
  • 6,897
  • 22
  • 48