1

I am developing web crawlers for a while and the most common issue for me is waiting for page to be completely loaded, includes requests, frames, scripts. I mean completely done.

I used several methods to fix it but when I use more than one thread to crawl websites I always get this kind of problem. the Driver opens itself, goes through the URL, doesn't wait and goes through the next URL.

My tries are:

JavascriptExecutor js = (JavascriptExecutor) driver.getWebDriver();
String result = js.executeScript("return document.readyState").toString();
    if (!result.equals("complete")) {
         Thread.sleep(1000)
    } 
}

wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath));

When I run a single-threaded code, I had no problem with pages but, When I use multi-threaded, It becomes a nightmare. Network cannot handle web pages like the single-threaded that is why I need waits in that while. I am looking for an exact solution. Is there any progress listener or something like that?

I am waiting for your advice.

Similar question:

Selenium -- How to wait until page is completely loaded

Ahmet Aziz Beşli
  • 913
  • 2
  • 10
  • 30

3 Answers3

1

In you code you check the readyState and if value is not complete, you just sleep for one second and proceed for the next steps. Here's code, that waiting for 10 seconds using WebDriverWait. Or you can use simple for loop:

WebDriverWait wait = new WebDriverWait(driver, 10);
        wait.until(d -> ((JavascriptExecutor) d).executeScript("return document.readyState !== 'loading'"));

or with interactive

wait.until(d -> ((JavascriptExecutor) d).executeScript("return (document.readyState === 'complete' || document.readyState === 'interactive')"));
Sers
  • 10,960
  • 2
  • 8
  • 25
  • Sorry, it should be `=== 'loading' ` of cause – Sers Feb 04 '20 at 13:50
  • This works only for the HTML content, the answer is true but I want something that will also wait for scripts to be loaded. Here is the example; https://www.google.de/maps/place/Hofladen+Bauernhof+Kitz/@50.2447099,8.6471593,17.33z/data=!4m8!1m2!3m1!2sKaufland+Hohen+Neuendorf!3m4!1s0x47bd071c9e2d281b:0xb3c05b189b75db1c!8m2!3d50.2449999!4d8.6494441 – Ahmet Aziz Beşli Feb 04 '20 at 15:10
  • In this example there is a pane on the left side, If you run document.readyState from developer console while the page is being loaded, you will see that state === complete before the pane loaded – Ahmet Aziz Beşli Feb 04 '20 at 15:12
1

To wait for document.readyState to be complete isn't a full proof approach to ensure presence, visibility or interactibility of an element.

Hence, the function:

JavascriptExecutor js = (JavascriptExecutor) driver.getWebDriver();
String result = js.executeScript("return document.readyState").toString();
    if (!result.equals("complete")) {
     Thread.sleep(1000)
    } 
}

And even waiting for jQuery.active == 0:

public void WaitForAjax2Complete() throws InterruptedException
{
    while (true)
    {
        if ((Boolean) ((JavascriptExecutor)driver).executeScript("return jQuery.active == 0")){
            break;
    }
    Thread.sleep(100);
    }
}

Will be a pure overhead.

You can find a couple of relevant discussions in:


Solution

The effective approach will be to induce WebDriverWait inconjunction with the ExpectedConditions either for:

  • presence of element
  • visibility of element
  • interactibility of element

You can find a couple of relevant discussions in:


More than one thread to crawl

WebDriver is not thread-safe. Having said that, if you can serialise access to the underlying driver instance, you can share a reference in more than one thread. This is not advisable. But you can always instantiate one WebDriver instance for each thread.

Ideally the issue of thread-safety isn't in your code but in the actual browser bindings. They all assume there will only be one command at a time (e.g. like a real user). But on the other hand you can always instantiate one WebDriver instance for each thread which will launch multiple browsing tabs/windows. Till this point it seems your program is perfect.

Now, different threads can be run on same Webdriver, but then the results of the tests would not be what you expect. The reason behind is, when you use multi-threading to run different tests on different tabs/windows a little bit of thread safety coding is required or else the actions you will perform like click() or send_keys() will go to the opened tab/window that is currently having the focus regardless of the thread you expect to be running. Which essentially means all the test will run simultaneously on the same tab/window that has focus but not on the intended tab/window.

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
0
public static void processing(){ 
    WebDriverWait wait = new WebDriverWait(driver, 30);
    wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//div[@id='Msgpanel']/div/div/img")));
    wait.until(ExpectedConditions.invisibilityOfElementLocated(By.xpath("//div[@id='Msgpanel']/div/div/img")));
}`enter code here`
Chris
  • 176
  • 11
  • Thanks, man, this always works. But what I want to find, a function that will wait until all the things loaded. I don't wanna give specific selectors. – Ahmet Aziz Beşli Feb 04 '20 at 15:08