4

I am in a situation where I need to allow a user to download a file dynamically determined from the URL. Before the download begins, I need to do some authentication, so the download has to run through a script first. All files would be stored outside of the web root to prevent manual downloading.

For example, any of the following could be download links:

Basically, the folder depth can vary.

To prevent a directory traversal, like say: http://example.com/downloads/../../../../etc/passwd I need to obviously do some checking on the URI. (Note: I do not have the option of storing this info in a database, the URI must be used)

Would the following regexp be bullet-proof in making sure that a user doesnt enter something fishy:

preg_match('/^\/([-_\w]+\/)*[-_\w]+\.(zip|gif|jpg|png|pdf|ppt|png)$/iD', $path)

What other options of making sure the URI is sane do I have? Possibly using realpath in PHP?

Emil H
  • 37,947
  • 10
  • 72
  • 95

6 Answers6

7

I would recommend using realpath() to convert the path into an absolute. Then you can compare the result with the path(s) to the allowed directories.

Tomalak
  • 306,836
  • 62
  • 485
  • 598
Emil H
  • 37,947
  • 10
  • 72
  • 95
  • Yes I think doing this along with a regexp check should probably do the trick –  Mar 04 '09 at 17:59
  • If the whole application is symlinked (e.g. `/var/www/spline_reticulator -> /opt/apps/app_47244`) and your allowed path is `/var/www/spline_reticulator/downloads` you won't be able to download anything if you try to use `realpath()` for sanitizing paths. – AndreKR Jul 03 '19 at 19:14
3

I'm not a PHP developer but I can tell you that using a Regex based protection for such a scenario is like wearing a T-shirt against a hurricane.

This kind of problem is known as a Canonicalization vulnerability in security parlance (whereby your application parses a given filename before the OS has had a chance to convert it to its absolute file path). Attackers will be able to come up with any number of permutations of the filename which would almost certainly fail to be matched by your Regex.

If you must use Regex, then make it as pessimistic as possible (match only valid filenames, reject everthing else). I would suggest that you do some research on Canonicalization methods in PHP.

Cerebrus
  • 25,080
  • 8
  • 54
  • 70
  • +1. Also know your server: if you're running PHP on Windows, an attempt to access a device-reserved filename like ‘com.txt’ may fail hard. – bobince Mar 04 '09 at 18:59
1

I think you could use htaccess for this.

Fernando Briano
  • 7,579
  • 13
  • 55
  • 74
1

I think the following 3 checks can be an ideal solution

  • Make sure the file matches a generally accepted Regexp of what the file path could look like
  • Use realpath (in PHP) to get a canonical form of the users requested file and compare it to make sure it is within a base directory
  • Starting with PHP v5.3, you can use ini_set to restrict the open_basedir to a specific folder, so that files outside of that folder cannot possibly be read (with fopen, include, fread, etc)
0

My solution

$filesPath = realpath(".");
$reqPath = realpath($_GET["file"]);
$pat = "%^".preg_quote($filesPath)."%";

if(preg_match($pat,$reqPath)){
    echo "File found";
}else{
    echo "Access denied"
}
?>
andrei
  • 91
  • 6
  • -1 for blogspam. Looking at the date stamp in your blog, and the content of your blog, you clearly created that blog post in response to this question. Why didn't you just post the response here? – Bryan May 05 '11 at 19:47
  • Yes, better thanks. Downvote removed. There is plenty of discussion on meta regarding this. By all means link to a blog to provide additional details to supplement your answer, but you shouldn't be posting your answer on your blog and providing nothing but a link. – Bryan Jul 25 '11 at 11:29
0

What characters will your filenames contain? If it's simply [a-zA-Z0-9] single dots dashes and slashes then feel free to strip anything else.

cherouvim
  • 30,497
  • 14
  • 99
  • 144