2

as far as I understand, there is no way in selenium to get the response code of the website. How can I work it around to know if a website has sent me an error or exception without having to expect an element in the site and wait for it until it times out?

use strict;
use warnings;
use Selenium::Chrome;
use Selenium::Waiter qw/wait_until/;

my $chrome_driver_path = "./../../tools/drivers/chromedriver.exe";
my $driver;
my %settings = (
    'binary' => $chrome_driver_path,
);

$driver = Selenium::Chrome->new(%settings);

print("Getting stackoverflow\n");
wait_until{$driver->get("https://www.stackoverflow.com")};
validate_site($driver);

print("Getting unexistent url of stackoverflow\n");
wait_until{$driver->get("https://www.stackoverflow.com/this-does-not-exists-and-returns-404")};
validate_site($driver);


sleep(20);
$driver->shutdown_binary;

sub validate_site{
    my ($driver) = @_;
    #if ($driver->something) {
        # print("Looks good)\n");
    #}else{
        # warn("Error\n");
    # }
}

Expected result:

Getting stackoverflow:

Looks good

Getting unexistent url of stackoverflow:

Error

PD: I want to use selenium because I´m working in websites with javascript and storing cookies through different views, this is just an example to illustrate the problem that could be solved with a curl, but is not the case in my project.

nck
  • 1,047
  • 8
  • 19
  • `curl` fetches a document, so it makes sense to get the response from the server. But Selenium remote controls a web browser. Web browers don't have return codes. In that sense, there are no HTTP response codes. Web browsers do all kinds of HTTP(S) requests. In that sense, there are multiple HTTP response codes. Either way, the concept of "getting the HTTP response code" is problematic. – ikegami Sep 29 '20 at 18:34
  • Let's not forget the issue for getting a page that returns a 3xx that redirects to a 4xx. Is that a 3xx or 4xx response? – ikegami Sep 29 '20 at 18:39
  • Finally, **your question links to an official statement saying what you want to do can't be done** and that this is intentional. You've answered your own question. Voting to close since it can't be answered as it stands. – ikegami Sep 29 '20 at 19:06
  • @ikegami As I said, I want to find a workaround in perl to verify that the page loaded is a website that I want to check or is a correctly loaded and not an error site which would be the equivalent of having a 4XX or 5XX. I'm not interested so much in the code so I don't understand why you try to close the question. Furthermore, there are some solutions in other languages with the same question with a lot of acceptance and not closed https://stackoverflow.com/questions/6509628/how-to-get-http-response-code-using-selenium-webdriver – nck Oct 01 '20 at 11:55
  • Now you're saying you do have solutions... So again, what exactly are you asking? – ikegami Oct 01 '20 at 23:15
  • But you didn't ask anything that you said wasn't possible. So again I ask that you clarify your question. There's no rudeness in what I'm doing. Your question as it stands cannot be answered. *You* are the one wasting time when you could be fixing the question. – ikegami Oct 02 '20 at 08:24
  • @ikegami I said you can not get directly the code as a `$driver->get_response_code` or equivalent, but there seem to be ways to access logs as suggested already in one answer or build proxies as MIM but I still have not been able to make that work which theorietically should be possible as is in other languages. – nck Oct 02 '20 at 08:26
  • This is not a discussion. They are not allowed. The problems with the questions should be fixed by editing the question – ikegami Oct 02 '20 at 08:27
  • So your real question is about problems getting those those solutions to work. As required, show your effort and identify what specific problems you are having. – ikegami Oct 02 '20 at 08:29

2 Answers2

2

Based on solution from your previous comment here's another solution. Here extra_capabilities are used to enable more logging (please note I added additional package). This will work in version 1.38 of Selenium::Remote::Driver that was released just recently, so you will need to update you packages if you haven't done so yet. This solution does not require falling back to WD2.

use strict;
use warnings;
use Selenium::Chrome;
use Selenium::Waiter qw/wait_until/;
use JSON;


my $chrome_driver_path = "./../../tools/drivers/chromedriver.exe";
my $driver;
my %settings = (
    'binary' => $chrome_driver_path,
    
    'extra_capabilities' =>{
      'goog:loggingPrefs' => {
          'performance' => 'ALL',
      },
      'goog:chromeOptions' => {
          'perfLoggingPrefs' => {
              'traceCategories' => 'performance',
          },
      },
    }
);


$driver = Selenium::Chrome->new(%settings);

print("Getting stackoverflow\n");
wait_until{$driver->get("https://www.stackoverflow.com")};
validate_site($driver);

print("Getting unexistent url of stackoverflow\n");
wait_until{$driver->get("https://www.stackoverflow.com/this-does-not-exists-and-returns-404")};
validate_site($driver);

#sleep(20);
$driver->shutdown_binary;

sub validate_site{
    my ($driver) = @_;
    
    my $logs = $driver->get_log('performance');
    my @responses = grep {$_->{'message'} =~ /"Network\.responseReceived"/ } @$logs;
    my @stat = grep {$_->{'message'} =~ $driver->get_current_url() } @responses;
    my $json= decode_json $stat[0]->{'message'};
    my $status = $json->{'message'}->{'params'}->{'response'}->{'status'};
    
    if ($status==200) {
         print("Looks good)\n");
    }else{
         warn("Error\n");
    }

}

In version 1.37 you would have to fallback to WD2 as goog:loggingPrefs capability was not supoprted:

use strict;
use warnings;
use Selenium::Chrome;
use Selenium::Waiter qw/wait_until/;
use JSON;


my $chrome_driver_path = "./../../tools/drivers/chromedriver.exe";
my $driver;
my %settings = (
    'binary' => $chrome_driver_path,
    
    'extra_capabilities' =>{
        'loggingPrefs' => {
            #'browser' => 'ALL',
            #'driver' => 'ALL',
            'performance' => 'ALL'
        },
        'perfLoggingPrefs' => {
            'traceCategories' => 'performance'
        },    
    }
);

$Selenium::Remote::Driver::FORCE_WD2=1;
$driver = Selenium::Chrome->new(%settings);

print("Getting stackoverflow\n");
wait_until{$driver->get("https://www.stackoverflow.com")};
validate_site($driver);

print("Getting unexistent url of stackoverflow\n");
wait_until{$driver->get("https://www.stackoverflow.com/this-does-not-exists-and-returns-404")};
validate_site($driver);

#sleep(20);
$driver->shutdown_binary;

sub validate_site{
    my ($driver) = @_;
    
    my $logs = $driver->get_log('performance');
    my @responses = grep {$_->{'message'} =~ /"Network\.responseReceived"/ } @$logs;
    my @stat = grep {$_->{'message'} =~ $driver->get_current_url() } @responses;
    my $json= decode_json $stat[0]->{'message'};
    my $status = $json->{'message'}->{'params'}->{'response'}->{'status'};
    
    if ($status==200) {
         print("Looks good)\n");
    }else{
         warn("Error\n");
    }

}

M-K
  • 96
  • 1
  • 8
  • Does it work for you? I get the following exception `Error while executing command: invalid argument: invalid argument: log type 'performance' not found (Session info: chrome=85.0.4183.121) at C:/Strawberry/perl/site/lib/Selenium/Remote/Driver.pm line 403. at C:/Strawberry/perl/site/lib/Selenium/Remote/Driver.pm line 353.` I already tried putting `w3c` flag to false but still. I think I read the get_log method is deprecated but I am not sure. – nck Oct 02 '20 at 08:19
  • 1
    Add `$Selenium::Remote::Driver::FORCE_WD2=1;` before driver initialization. I updated my answer to add it. – M-K Oct 02 '20 at 12:28
  • thanks that seems to work! Although I guess this will be deprecated?? – nck Oct 02 '20 at 12:38
  • 1
    Selenium::Remote::Driver v 1.38 is out and now it is not necessary to fallback to WD2 with `goog:loggingPrefs` capability, I've updated the answer again to include code for both cases. – M-K Oct 20 '20 at 09:47
1

As you mentioned, WebDriver does not expose http response codes and it is suggested to use proxy if you really need it .

If you do not want to wait for the element too long, you could reduce implicit timeouts in validate_site and look for reliable element eg:

sub validate_site{
    my ($driver) = @_;
    
    my $implicit=$driver->get_timeouts()->{implicit};# get current value
    $driver->set_implicit_wait_timeout(0);# set it to 0
    my @elem = $driver->find_elements(".py128","css");#this 'reliable element' it's present on https://www.stackoverflow.com but not on 404 page

    if (@elem) {
         print("Looks good)\n");
    }else{
         warn("Error\n");
    }

    $driver->set_implicit_wait_timeout($implicit);# restore original value
}    

or if you really want to workaround it and don't mind duplicating requests you could try https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API

sub validate_site{
    my ($driver) = @_;
    
    my $script = q{
       let page_url=window.document.URL;
       let resp = await fetch(page_url);
       return resp.status;
    }; 
    my $status = $driver->execute_script($script);

    if ($status==200) {
         print("Looks good)\n");
    }else{
         warn("Error\n");
    }
   
}

if you don't need the body (saves you response size) then you can request only headers by adding HEAD method (instead of default GET)

let resp = await fetch(page_url,{method:"HEAD"});

Dharman
  • 21,838
  • 18
  • 57
  • 107
M-K
  • 96
  • 1
  • 8
  • Thanks four your answer, but I think they don't serve my purpose, as I am working with pages which have session cookies which get refreshed in every new site. And The first solution is to specific, and I would have to implement it for every single page. I've been trying to set up a proxy as suggested but without success yet. – nck Oct 01 '20 at 22:09