38

When displaying images on our website, we check if the file exists with a call to file_exists(). We fall back to a dummy image if the file was missing.

However, profiling has shown that this is the slowest part of generating our pages with file_exists() taking up to 1/2 ms per file. We are only testing 40 or so files, but this still pushes 20ms onto the page load time.

Can anyone suggest a way of making this go faster? Is there a better way of testing if the file is present? If I build a cache of some kind, how should I keep it in sync.

Rik Heywood
  • 13,368
  • 9
  • 56
  • 77
  • 66
    If the *slowest part* in your code only adds 20ms in total load time, you should go out and treat yourself to a beer, instead of worrying about it so much you're posting a question to SO ;-) – Duroth Nov 10 '09 at 15:27
  • 1
    What file system are you using? - The speed of file_Exists() should mostly depend on the speed of the stat() syscall. How many files are in the directory? (Depending on the file system the number of files has an impact on the stat() speed) – johannes Nov 10 '09 at 15:33
  • Show us the code around the function call. – powtac Nov 10 '09 at 15:34
  • 2
    At 1/2 ms each, you could do 2000 file_exists in a second – Adam Hopkinson Nov 10 '09 at 15:35
  • 31
    Oh, quoting Wikipedia... *The average length of a blink is 300 to 400 Miliseconds.* Not sure why, but it felt appropriate to share it with you. – Duroth Nov 10 '09 at 15:36
  • There are a lot of files in the directory (1000's) which is probably impacting the performance more than any other factor. I might look into breaking this up into small batches of files. – Rik Heywood Nov 10 '09 at 15:59
  • 1
    I've actually tried this once, my function took 11 times the execution time of file_exists() so my best bet is to use caching better, or come up with another method. – Peter Lindqvist Nov 10 '09 at 16:12
  • You can do an awful lots in 20ms compared to checking of 40 files exist. I don't understand the comments saying that it does not matter. – Joel M Oct 18 '20 at 18:03

20 Answers20

30

file_exists() should be a very inexpensive operation. Note too that file_exists builds its own cache to help with performance.

See: http://php.net/manual/en/function.file-exists.php

Hassaan
  • 6,355
  • 5
  • 25
  • 44
RC.
  • 25,495
  • 8
  • 69
  • 90
  • 3
    I guess I should just accept that the performance is fine and leave it as is. I might go an break up the files into more folders though, as this will probably help things. – Rik Heywood Nov 10 '09 at 16:12
  • 4
    According to the documentation caching will only occur if file_exists() returns true. So if you happen to check for in-existent files the function will check every time. You could create a symlink to the dummy image when file_exists() returns false so that subsequent calls will be cached. (this might cause other problems) – Patrick Forget Jul 03 '13 at 15:34
22

Use absolute paths! Depending on your include_path setting PHP checks all(!) these dirs if you check relative file paths! You might unset include_path temporarily before checking the existence.

realpath() does the same but I don't know if it is faster.

But file access I/O is always slow. A hard disk access IS slower than calculating something in the processor, normally.

powtac
  • 37,821
  • 25
  • 107
  • 164
  • Good tip. I already provide a full path name to the file though (mostly to avoid the unreliable nature of include path settings). – Rik Heywood Nov 10 '09 at 16:00
  • 1
    A thread about this problem and a script to test: http://bytes.com/topic/php/answers/10394-file_exists-expensive-performance-terms – powtac Nov 10 '09 at 16:19
  • I could be wrong, but knowing if a file exists requires a check in the FS index table, so it shouldn't be a real IO operation that expects a file "read" or "write" operation on disk. – TechNyquist May 16 '16 at 12:59
20

The fastest way to check existence of a local file is stream_resolve_include_path():

if (false !== stream_resolve_include_path($s3url)) { 
  //do stuff 
}

Performance results stream_resolve_include_path() vs file_exists():

Test name       Repeats         Result          Performance     
stream_resolve  10000           0.051710 sec    +0.00%
file_exists     10000           0.067452 sec    -30.44%

In test used absolute paths. Test source is here. PHP version:

PHP 5.4.23-1~dotdeb.1 (cli) (built: Dec 13 2013 21:53:21)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies

Alexander Yancharuk
  • 12,038
  • 4
  • 46
  • 53
11

We fall back to a dummy image if the file was missing

If you're just interested in falling back to this dummy image, you might want to consider letting the client negotiate with the server by means of a redirect (to the dummy image) on file-not-found.

That way you'll just have a little redirection overhead and a not-noticeable delay on the client side. At least you'll get rid of the "expensive" (which it isn't, I know) call to file_exists.

Just a thought.

jensgram
  • 29,088
  • 5
  • 77
  • 95
  • 2
    +1 for clever. Now I'm curious about what happens if you pass jpg data back with a 404 response. This is, after all, a 404-type behavior that OP is looking for. – timdev Nov 10 '09 at 16:23
  • 1
    Should be rendered OK. Basically it's the same behavior for custom 404-pages; ther're rendered as XHTML if served as such. Haven't tested, though. – jensgram Nov 10 '09 at 18:32
7

Benchmarks with PHP 5.6:

Existing File:

0.0012969970 : stream_resolve_include_path + include  
0.0013520717 : file_exists + include  
0.0013728141 : @include  

Invalid File:

0.0000281333 : file_exists + include  
0.0000319480 : stream_resolve_include_path + include  
0.0001471042 : @include  

Invalid Folder:

0.0000281333 : file_exists + include  
0.0000360012 : stream_resolve_include_path + include  
0.0001239776 : @include  

Code:

// microtime(true) is less accurate.
function microtime_as_num($microtime){
  $time = array_sum(explode(' ', $microtime));
  return $time;
}

function test_error_suppression_include ($file) {
  $x = 0;
  $x = @include($file);
  return $x;
}

function test_file_exists_include($file) {
  $x = 0;
  $x = file_exists($file);
  if ($x === true) {
    include $file;
  }
  return $x;
}

function test_stream_resolve_include_path_include($file) {
  $x = 0;
  $x = stream_resolve_include_path($file);
  if ($x !== false) {
    include $file;
  }
  return $x;
}

function run_test($file, $test_name) {
  echo $test_name . ":\n";
  echo str_repeat('=',strlen($test_name) + 1) . "\n";

  $results = array();
  $dec = 10000000000; // digit precision as a multiplier

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_error_suppression_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time*$dec] = '@include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_stream_resolve_include_path_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec] = 'stream_resolve_include_path + include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_file_exists_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec ] = 'file_exists + include';

  ksort($results, SORT_NUMERIC);

  foreach($results as $seconds => $test) {
    echo number_format($seconds/$dec,10) . ' : ' . $test . "\n";
  }
  echo "\n\n";
}

run_test($argv[1],$argv[2]);

Command line Execution:

php test.php '/path/to/existing_but_empty_file.php' 'Existing File'  
php test.php '/path/to/non_existing_file.php' 'Invalid File'  
php test.php '/path/invalid/non_existing_file.php' 'Invalid Folder'  
4

Create a hashing routine for sharding the files into multiple sub-directories.

filename.jpg -> 012345 -> /01/23/45.jpg

Also, you could use mod_rewrite to return your placeholder image for requests to your image directory that 404.

Hassaan
  • 6,355
  • 5
  • 25
  • 44
racerror
  • 1,539
  • 7
  • 9
3

file_exists() is automatically cached by PHP. I don't think you'll find a faster function in PHP to check the existence of a file.

See this thread.

Community
  • 1
  • 1
mculp
  • 2,517
  • 1
  • 23
  • 32
3

Old question, I'm going to add an answer here. For php 5.3.8, is_file() (for an existing file) is an order of magnitude faster. For a non-existing file, the times are nearly identical. For PHP 5.1 with eaccelerator, they are a little closer.

PHP 5.3.8 w & w/o APC

time ratio (1000 iterations)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.002305269241333)
    [5."is_link('exists')"] => 1.21x    (0.0027914047241211)
    [7."stream_resolve_inclu"(exists)] => 2.79x (0.0064241886138916)
    [1."file_exists('exists')"] => 13.35x   (0.030781030654907)
    [8."stream_resolve_inclu"(nonexists)] => 14.19x (0.032708406448364)
    [4."is_file('nonexists)"] => 14.23x (0.032796382904053)
    [6."is_link('nonexists)"] => 14.33x (0.033039808273315)
    [2."file_exists('nonexists)"] => 14.77x (0.034039735794067)
)

PHP 5.1 w/ eaccelerator

time ratio (1000x)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.000458002090454)
    [5."is_link('exists')"] => 1.22x    (0.000559568405151)
    [6."is_link('nonexists')"] => 3.27x (0.00149989128113)
    [4."is_file('nonexists')"] => 3.36x (0.00153875350952)
    [2."file_exists('nonexists')"] => 3.92x (0.00179600715637)
    [1."file_exists('exists"] => 4.22x  (0.00193166732788)
)

There are a couple of caveats.
1) Not all "files" are files, is_file() tests for regular files, not symlinks. So on a *nix system, you can't get away with just is_file() unless you are sure that you are only dealing with regular files. For uploads, etc, this may be a fair assumption, or if the server is Windows based, which does not actually have symlinks. Otherwise, you'll have to test is_file($file) || is_link($file).

2) Performance definitely degrades for all methods if the file is missing and becomes roughly equal.

3) Biggest caveat. All the methods cache the file statistics to speed lookup, so if the file is changing regularly or quickly, deleted, reappears, deletes, then clearstatcache(); has to be run to insure that the correct file existence information is in the cache. So I tested those. I left out all the filenames and such. The important thing is that almost all the times converge, except stream_resolve_include, which is 4x as fast. Again, this server has eaccelerator on it, so YMMV.

time ratio (1000x)
Array
(
    [7."stream_resolve_inclu...;clearstatcache();"] => 1.00x    (0.0066831111907959)
    [1."file_exists(...........;clearstatcache();"] => 4.39x    (0.029333114624023)
    [3."is_file(................;clearstatcache();] => 4.55x    (0.030423402786255)
    [5."is_link(................;clearstatcache();] => 4.61x    (0.030798196792603)
    [4."is_file(................;clearstatcache();] => 4.89x    (0.032709360122681)
    [8."stream_resolve_inclu...;clearstatcache();"] => 4.90x    (0.032740354537964)
    [2."file_exists(...........;clearstatcache();"] => 4.92x    (0.032855272293091)
    [6."is_link(...............;clearstatcache();"] => 5.11x    (0.034154653549194)
)

Basically, the idea is, if you're 100% sure that it is a file, not a symlink or a directory, and in all probability, it will exist, then use is_file(). You'll see a definite gain. If the file could be a file or a symlink at any moment, then the failed is_file() 14x + is_link() 14x (is_file() || is_link()), and will end up being 2x slower overall. If the file's existence changes A LOT, then use stream_resolve_include_path().

So it depends on your usage scenario.

Beracah
  • 346
  • 2
  • 7
2

If you are only checking for existing files, use is_file(). file_exists() checks for a existing file OR directory, so maybe is_file() could be a little faster.

Hassaan
  • 6,355
  • 5
  • 25
  • 44
Alex
  • 11,437
  • 7
  • 39
  • 51
  • 2
    Related: [is_file/file_exists performance and cache](http://stackoverflow.com/q/4099103/500559) – Eldros Sep 29 '11 at 14:09
2

I don't exactly know what you want to do, but you could just let the client handle it.

Community
  • 1
  • 1
ViperArrow
  • 21
  • 1
1

Are they all in the same directory? If so it may be worth getting the list of files and storing them in a hash and comparing against that rather than all the file_exists lookups.

easement
  • 5,945
  • 3
  • 26
  • 35
1

If you want to check existence of an image file, a much faster way is to use getimagesize !

Faster locally and remotely!

if(!@GetImageSize($image_path_or_url)) // False means no imagefile
 {
 // Do something
 }
0

What about glob()? But I'm not sure if it's fast.

http://www.php.net/manual/en/function.glob.php

Hassaan
  • 6,355
  • 5
  • 25
  • 44
juno
  • 31
  • 4
  • 5
    glob() is a dinosaur compared to file_exists()! I don't think it will help in this case. – Pekka Nov 10 '09 at 15:40
0

I find 1/2ms per call very, very affordable. I don't think there are much faster alternatives around, as the file functions are very close to the lower layers that handle file operations.

You could however write a wrapper to file_exists() that caches results into a memcache or similar facility. That should reduce the time to next to nothing in everyday use.

Pekka
  • 418,526
  • 129
  • 929
  • 1,058
0

I came to this page looking for a solution, and it seems fopen may do the trick. If you use this code, you might want to disable error logging for the files that are not found.

<?php
for ($n=1;$n<100;$n++){
clearstatcache();
$h=@fopen("files.php","r");
if ($h){
echo "F";
fclose($h);
}else{
echo "N";
}
}
?>
0

I think the best way is to keep the image url in the database and then put it in a session variable especially when you have authentication. These way you dont have to be checking each time a page reloads

daRula
  • 123
  • 2
  • 11
0

You could do a cronjob to periodically create a list of images and store them in DB/file/BDB/...

Every half an hour should be fine, but be sure to create an interface to reset cache in case of file addition/delete.

And then, it's also easy to run find . -mmin -30 -print0 on the shell and add new files.

Antti Rytsölä
  • 1,317
  • 12
  • 20
0

In 2021, 12 years later since the question was asked I have the same use case. I check with file_exist for around 40 images in a loop before I decide what to show.

The figures (PHP 7.4) in milliseconds:

  • on local dev machine (Win10, WAMP, Samsung SSD): roughly 0.1 (1/10) millisecond per image, roughly 1000 images in the folder;
  • on server (pretty basic cheap one, VPS 1 Intel Xeon, RAM 2GB, SSD, Ubuntu, LAMP): roughly 0.01 (1/100) millisecond per image, 14,000 images in the folder;

The server is 10 times faster than the dev machine, and quite indistinguishable from overall UX performance POV where 30-50 ms is somewhat first noticeable threshold.

On server checking the array of 40 images I spend 0.4 ms to check if anyone of them not-existent. BTW no difference in performance whether some of the images exist or not.

So this should be of no question whether to check with file_exist or not because of disk performance. Check if you need.

Valentine Shi
  • 3,093
  • 2
  • 25
  • 35
0

When you save a file to a folder, if the upload was successfully, you can store the path to a DB Table.

Then you will just have to make a query to the database in order to find the path of the requested file.

Stephen
  • 1,739
  • 2
  • 26
  • 36
Galois
  • 11
-1

I'm not even sure if this will be any faster but it appears as though you would still like to benchmark soooo:

Build a cache of a large array of all image paths.

$array = array('/path/to/file.jpg' => true, '/path/to/file2.gif' => true);

Update the cache hourly or daily depending on your requirements. You would do this utilizing cron to run a PHP script which will recursively go through the files directory to generate the array of paths.

When you wish to check if a file exists, load your cached array and do a simply isset() check for a fast array index lookup:

if (isset($myCachedArray[$imgpath])) {
    // handle display
}

There will still be overhead from loading the cache but it will hopefully be small enough to stay in memory. If you have multiple images you are checking for on a page you will probably notice more significant gains as you can load the cache on page load.

Corey Ballou
  • 39,300
  • 8
  • 60
  • 75