2

I use Selenium webdriver with Firefox for scraping web pages. Sometimes web browser waits endless time for some excessive requests complete (e.g. to facebook.net).

I've tried to use BrowserMob-Proxy to filter these requests. But it didn't help. These requests, even after receiving 200 or 404 code, doesn't stop.

I thought about some possibility to stop web browser loads page after some amount of time. For example:

try {
    Thread.sleep(5000);
} catch (InterruptedException ex) {
      Thread.currentThread().interrupt(); }
((JavascriptExecutor) driver).executeScript("window.stop();");

But it doesn't work until web page loads completely.

What can you suggest me to do in my case?

P.S. This is a code with using a pageLoadTimeout parameter.

WebDriver driver;
FirefoxBinary firefox;
FirefoxProfile customProfile;

public static void main(String[] args) {
openFirefox();
for (String url : listOfUrls) {                   
  Boolean pageLoaded = false;
  while (pageLoaded == false) {
  try {
    driver.get(url);
    pageLoaded = true;
    } catch (org.openqa.selenium.TimeoutException ex) {
      System.out.println("Got TimeoutException on page load. Restarting browser...");
      restartFirefox();
    }
  }
  //here I do something with a content of a webpage
 }
 }

 public static void openFirefox(){
        firefox = new FirefoxBinary(new File(Constants.PATH_TO_FIREFOX_EXE));
        customProfile = new FirefoxProfile();
        customProfile.setAcceptUntrustedCertificates(true);
        customProfile.setPreference("webdriver.load.strategy", "unstable");
        driver = new FirefoxDriver(firefox, customProfile);
        driver.manage().deleteAllCookies();
        driver.manage().timeouts().pageLoadTimeout(60, TimeUnit.SECONDS);
    } 

private static void restartFirefox() {
        driver.close();
        firefox.quit();
        openFirefox();
    }
Valentyn Grygoriev
  • 441
  • 10
  • 29
  • Here is *something*: http://stackoverflow.com/a/26808275/3124333 – SiKing Jul 02 '15 at 15:19
  • 1
    Refer [this](http://stackoverflow.com/questions/15834158/stop-browser-load-from-selenium-webdriver) [this](http://stackoverflow.com/questions/21214340/make-selenium-webdriver-stop-loading-the-page-if-the-desired-element-is-already) – Madhan Jul 02 '15 at 16:08
  • Modified Firefox parameters didn't help. But You @Madhan gave me a good idea. I've decided to use Chrome instead of Firefox. And I now don't see hangups. Thank you :) – Valentyn Grygoriev Jul 06 '15 at 13:28
  • May be this link of some use in such scenarios http://stackoverflow.com/a/39944726/3820418 – Jlearner Oct 10 '16 at 05:39

1 Answers1

0
  1. How about using timeouts? So for each WebDriver instance that you are using you need to set:

    WebDriver.Timeouts pageLoadTimeout(long time, java.util.concurrent.TimeUnit unit)

Which by the Documentation:

Sets the amount of time to wait for a page load to complete before throwing an error. If the timeout is negative, page loads can be indefinite.

Parameters:
time - The timeout value.
unit - The unit of time. Returns:
A Timeouts interface.
  1. I've tried to use BrowserMob-Proxy to filter these requests. But it didn't help. These requests, even after receiving 200 or 404 code, doesn't stop.

What do you mean "didn't help". I don't believe you. Please share your code for blacklisting URLs. For example, following code code returned HTTP.200 for any google-analytics related site for me

server.blacklistRequests("https?://.*\\.google-analytics\\.com/.*", 200); // server is bmp proxy server
  1. I have heard, that WebDriver should now have webdriver.load.strategy. I have never used it though. So the default behavior of WebDrivers blocking calls (a'la get()) is to wait for document.readyState to be complete, but I have read that with this property you could tell the driver to return at once. So might be worth googling it for a while.
Erki M.
  • 4,744
  • 1
  • 42
  • 67
  • Sorry for the long silence. Unfortunately I can't share the code where I use BrowserMob, because I've already deleted it. Unfortunately your 1 and 3 options doesn't work. Sometimes I get TimeOut Exception after the 1 minute delay, but sometimes I don't get it and my web browser as usually wait endless for some background request. I added the code where I use pageLoadTimeout and readyState at the top in the P.S. block. – Valentyn Grygoriev Jul 27 '15 at 11:19