4

I am working on a project, in which I should click banners and get redirect chains.

In order not to get my page override and to make it easier for next step, I thought I should ctrl+click on banners and make it open in another browser tab to get all the real redirect chains.

I've researched a lot, but only found that present methods would dump HAR files to get redirect chains. But to get HAR files, the Network panel in Developer Tools window should be opened previously in a tab. BUT, in my case, a new tab could not open a Network panel before the tab is loaded; I can't open the Network panel and reload the page either because redirect chains would not be real. Additionally, the embedded performance log is not appliable in my case

Can anyone tell me how can I solve these problems? Or was I wrong about any part above? Any advice would be greatly appreciated since I really have been working on it for long.

Moshe Slavin
  • 4,696
  • 5
  • 18
  • 34
hans
  • 73
  • 5
  • This sounds like it can be solved with [selenium-wire](https://pypi.org/project/selenium-wire/#response-objects) or with [browsermob](https://pypi.org/project/browsermob-proxy/) – Moshe Slavin Apr 18 '21 at 14:03
  • Now working on selenium-wire. Browsermob can download HAR file, don't know whether I should open Network panel at first to do that for now. – hans Apr 19 '21 at 06:32
  • selenium-wire seems ok, now working on the speed of it. Also, how to sift out real direct paths remains to be a question. By the way, in selenium-wire you can boot up an undetected-chrome which can pass by bot detection. – hans Apr 19 '21 at 09:39
  • Regarding Browsermob I don't think you need the Network panel open... Regarding selenium-wire, I think it's a great package! one of the advantages is collaboration with undetected-chrome. – Moshe Slavin Apr 19 '21 at 09:59
  • But seems undetected-chrome isn't compatible with `click` method. See **important note** in [undetected-chormedriver](https://pypi.org/project/undetected-chromedriver/) – hans Apr 19 '21 at 10:14
  • it works well with your comment. Consider write it as an answer, thanks. – hans Apr 26 '21 at 09:54

1 Answers1

1

In order to get the redirect chains, you'll need the HAR files.

There are a few packages that combine selenium with other libraries to accomplish this (and other additions as well).

One is browsermob-proxy.

BrowserMob Proxy allows you to manipulate HTTP requests and responses, capture HTTP content, and export performance data as a HAR file. BMP works well as a standalone proxy server, but it is especially useful when embedded in Selenium tests.

Here is an example:

from browsermobproxy import Server
server = Server("path/to/browsermob-proxy")
server.start()
proxy = server.create_proxy()

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy)) #Configure chrome options
driver = webdriver.Chrome(chrome_options=chrome_options)
proxy.new_har("StackOverFlow") 
driver.get("https://stackoverflow.com")
print(proxy.har)

There are other libraries such as selenium-wire that have similar capabilities (with other features as well).

Note: no need to open the Network panel.

Make sure to download the proxy and add the path to the initiation of the Server.

Moshe Slavin
  • 4,696
  • 5
  • 18
  • 34