1

I'm currently using the following method to check if a url exists

$url = 'https://www.facebook.com/a-test-example-232397848665383511';
$headers = @get_headers($url);
if(strpos($headers[0],'200')===false){
    print('NOT found!');
} else {
    print('found!');
}

This prints NOT found! even though the page clearly resolves when visited. I print the headers and found it is because it returns a 302. Is there a way of doing a strpos to test for all possible header values that resolve?

Current output of headers:

Array
(
    [0] => HTTP/1.1 302 Found
    [1] => Location: https://www.facebook.com/unsupportedbrowser
    [2] => Vary: Accept-Encoding
    [3] => Content-Type: text/html
    // more array items

If I type in a url that i know fails I get the following:

Array
(
    [0] => HTTP/1.1 404 Not Found
    [1] => P3P: CP="Facebook does not have a P3P policy." 
    [2] => Strict-Transport-Security: max-age=15552000; preload
    // rest of array

Is it safe to test simply for a 404?

  • Yes, you can use two `strpos` checks with an `||` or-condition to make the `if` check more obtuse. Or use a regex. – mario Apr 20 '16 at 12:03

2 Answers2

12

I would use cURL for url verification. An example method would be as follows

    public function urlExists($url) {

        $handle = curl_init($url);
        curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);

        $response = curl_exec($handle);
        $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);

        if($httpCode >= 200 && $httpCode <= 400) {
            return true;
        } else {
            return false;
        }

        curl_close($handle);
    }
Kevin Lynch
  • 22,799
  • 2
  • 32
  • 37
2

Server can respond with different status codes as described in RFC 2616 For you task all codes 2xx and 3xx mean success.

Performance note: get_headers by default uses GET method but if you not interested in page content it's better and faster to use HEAD method.

stream_context_set_default(
  array(
      'http' => array(
          'method' => 'HEAD'
      )
  )
);
$headers = @get_headers($url);
$status = substr($headers[0], 9, 3);
if ($status >= 200 && $status < 400 ) {
  print('found!');
} else {
  print('NOT found!');
}
Andrew Kolpakov
  • 319
  • 1
  • 7
  • Seems to me this might be the best answer. Very quick and works well. Plenty of scope to add other header checks easily. – Steve Oct 11 '20 at 18:29