11

I would like to normalize a path from an external resource to prevent directory traversal attacks. I know about the realpath() function, but sadly this function returns only the path of existing directories. So if the directory doesn't exist (yet) the realpath() function cuts off the whole part of the path which doesn't exist.

So my Question is: Do you know a PHP function which only normalizes the path?

PS: I also don't want to create all possible directories in advance ;-)

JepZ
  • 933
  • 12
  • 24

5 Answers5

6

There's no built-in PHP function for this. Use something like the following instead:

function removeDots($path) {
    $root = ($path[0] === '/') ? '/' : '';

    $segments = explode('/', trim($path, '/'));
    $ret = array();
    foreach($segments as $segment){
        if (($segment == '.') || strlen($segment) === 0) {
            continue;
        }
        if ($segment == '..') {
            array_pop($ret);
        } else {
            array_push($ret, $segment);
        }
    }
    return $root . implode('/', $ret);
}
Tom Imrei
  • 1,464
  • 11
  • 15
  • I also though about such a solution but since there are multiple ways to encode the dots ([see wikipedia](http://en.wikipedia.org/wiki/Directory_traversal_attack#URI_encoded_directory_traversal)), this wouldn't be enough :-/ – JepZ Apr 10 '12 at 17:08
  • 2
    Well, this was the [MVP][0] implementation. You can add a rawurldecode() call, and a regexp match before it to control what characters you allow in your paths. On the other hand, the question was if there's a builtin function for this. This code was only possible way to go from there. [0]:http://en.wikipedia.org/wiki/Minimum_viable_product – Tom Imrei Apr 10 '12 at 19:29
2

Thanks to Benubird / Cragmonkey corrected me that under some situation my previous answer didn't work. thus I make a new one, for the original purpose: Perform good, fewer lines, and with pure regular expression:

This time I tested with much more strict test case as below.

$path = '/var/.////./user/./././..//.//../////../././.././test/////';

function normalizePath($path) {
    $patterns = array('~/{2,}~', '~/(\./)+~', '~([^/\.]+/(?R)*\.{2,}/)~', '~\.\./~');
    $replacements = array('/', '/', '', '');
    return preg_replace($patterns, $replacements, $path);
}

The correct answer would be /test/.

Not meant to do competition, but performance test is a must:

test case: for loop 100k times, on an Windows 7, i5-3470 Quad Core, 3.20 GHz.

mine: 1.746 secs.

Tom Imrei: 4.548 secs.

Benubird: 3.593 secs.

Ursa: 4.334 secs.

It doesn't means my version is always better. In several situation they perform simular.

Val
  • 17,555
  • 7
  • 57
  • 78
  • 1
    This works okay unless there are multiple instances of `/../`. For example, `/a/b/c/../../../d/e/file.txt` should resolve to `/d/e/file.txt`, instead it only goes back one level (`/a/b/d/e/file.txt`). Also, it doesn't like even numbers of `/../`, such as `/a/b/c/../../d/e/file.txt`, which resolves to `/a/b/.d/e/file.txt` (extra period) – CragMonkey Aug 04 '15 at 22:54
  • 1
    @Cragmonkey, thanks for the correction! I edited my post. – Val Aug 06 '15 at 08:21
2

I think Tamas' solution will work, but it is also possible to do it with regex, which may be less efficient but looks neater. Val's solution is incorrect; but this one works.

function normalizePath($path) {
    do {
        $path = preg_replace(
            array('#//|/\./#', '#/([^/.]+)/\.\./#'),
            '/', $path, -1, $count
        );
    } while($count > 0);
    return $path;
}

Yes, it does not handle all the possible different encodings of ./\ etc. that there can be, but that is not the purpose of it; one function should do one thing only, so if you want to also convert %2e%2e%2f into ../, run it through a separate function first.

Realpath also resolves symbolic links, which is obviously impossible if the path doesn't exist; but we can strip out the extra '/./', '/../' and '/' characters.

Benubird
  • 15,843
  • 24
  • 83
  • 128
  • This works with some case, but sometimes cannot perform correctly for example: $path = '/var/.////./user/./././..//.//../////../././.././test/////'; $path = '/var/user/.///////././.././.././././test/'; The results of both should be /test/, but the first one return "/var/test", second one return "/var/user/test/". – Val Aug 06 '15 at 08:34
  • 1
    @Val You're quite right, there was an error there - thanks for pointing that out! Although, you're examples are not entirely correct - the first one reduces to `/../../test/`, not `/test/`. – Benubird Aug 06 '15 at 09:08
  • @ Benubird I did extra work to remove the redundant /../../ because it means nothing under absolute path, and looks better. But I agree with you, leave it there would make it more flexible to work with relative path. – Val Aug 07 '15 at 01:50
  • this won't work with `/smth/../smth/../`, use `'#/(?:([^/.]+)/\.\./)+#'` instead. Also, unfortunately, your implementation doesn't resolve `/folder.with.dots/../` – YakovL Mar 28 '18 at 09:31
1

Strict, but safe implementation. If you use only ASCII for file names it would be suitable:

/**
 * Normalise a file path string so that it can be checked safely.
 *
 * @param $path string
 *     The path to normalise.
 * @return string
 *    Normalised path or FALSE, if $path cannot be normalized (invalid).
 */
function normalisePath($path) {
  // Skip invalid input.
  if (!isset($path)) {
    return FALSE;
  }
  if ($path === '') {
    return '';
  }

  // Attempt to avoid path encoding problems.
  $path = preg_replace("/[^\x20-\x7E]/", '', $path);
  $path = str_replace('\\', '/', $path);

  // Remember path root.
  $prefix = substr($path, 0, 1) === '/' ? '/' : '';

  // Process path components
  $stack = array();
  $parts = explode('/', $path);
  foreach ($parts as $part) {
    if ($part === '' || $part === '.') {
      // No-op: skip empty part.
    } elseif ($part !== '..') {
      array_push($stack, $part);
    } elseif (!empty($stack)) {
      array_pop($stack);
    } else {
      return FALSE; // Out of the root.
    }
  }

  // Return the "clean" path
  $path = $prefix . implode('/', $stack);
  return $path;
}
ursa
  • 3,680
  • 1
  • 19
  • 36
  • This works with some case, but sometimes cannot perform correctly for example: $path = '/var/.////./user/./././..//.//../////../././.././test/////'; $path = '/var/user/./././.././../.././././test/'; The results of both should be /test/, but empty string returned. – Val Aug 06 '15 at 08:29
0

My 2 cents. The regexp is used only for empty blocks of path:

<?php 
echo path_normalize('/a/b/c/../../../d/e/file.txt');

echo path_normalize('a/b/../c');

echo path_normalize('./../../etc/passwd');

echo path_normalize('/var/user/.///////././.././.././././test/');

function path_normalize($path){
    $path   = str_replace('\\','/',$path);
    $blocks = preg_split('#/#',$path,null,PREG_SPLIT_NO_EMPTY);
    $res    = array();

    while(list($k,$block) = each($blocks)){
        switch($block){
            case '.':
                if($k == 0) 
                    $res = explode('/',path_normalize(getcwd()));
            break;
            case '..';
                if(!$res) return false;
                array_pop($res);
            break;
            default:
                $res[] = $block;
            break;
        }
    }
    return implode('/',$res);
}
?>
YakovL
  • 5,213
  • 10
  • 46
  • 71