5

I want to use PHP to read all files and paths ignored by .gitignore configuration. Just like how git does.

It's possible to read directory repeatedly and use regular expression for each file to filter. But it`s so ineffective if the path have too much files.

Any good and most effective way to read target files and path ignored by .gitignore?

Angolao
  • 862
  • 1
  • 14
  • 26
  • Git provides the `check-ignore` subcommand that can tell you what paths are ignored (and, optionally, not ignored), and this of course handles the full complexity of the various ignore files and lists. [My answer to another question](https://stackoverflow.com/a/48664610/107294) provides the details of how to do this. – cjs Feb 07 '18 at 13:20

6 Answers6

8

You need to proceed in several steps:

1 - Find the .gitignore files

Each folder can have one, so don't assume there's a single one.

And submodules have a .git link to the main .git folder, so be wary about stopping too early as well.

It'll go something like:

function find_gitignore_files($dir) {
  $files = array();
  while (true) {
    $file = "$dir/.gitignore";
    if (is_file($file)) $files[] = $file;
    if (is_dir("$dir/.git") && !is_link("$dir/.git")) break;  # stop here
    if (dirname($dir) === '.') break;                         # and here
    $dir = dirname($dir);
  }
  return $files;
}

2 - Parse each .gitignore file

You need to ignore comments, mind the negation operator (!), and mind the globs.

This one is, give or take, is going to go something like:

function parse_git_ignore_file($file) { # $file = '/absolute/path/to/.gitignore'
  $dir = dirname($file);
  $matches = array();
  $lines = file($file);
  foreach ($lines as $line) {
    $line = trim($line);
    if ($line === '') continue;                 # empty line
    if (substr($line, 0, 1) == '#') continue;   # a comment
    if (substr($line, 0, 1) == '!') {           # negated glob
      $line = substr($line, 1);
      $files = array_diff(glob("$dir/*"), glob("$dir/$line"));
    } else {                                    # normal glob
      $files = glob("$dir/$line");
    }
    $matches = array_merge($matches, $files);
  }
  return $matches;
}

(Note: none of the above is tested, but they should put you in the right direction.)

Denis de Bernardy
  • 67,991
  • 12
  • 114
  • 140
  • This answer is incomplete and should not be marked as the accepted answer.. In details, PHP glob is functionally quite far from `.gitignore` [patterns](https://git-scm.com/docs/gitignore#_pattern_format). Also you are not covering the listing of the files after applying filters. – Syffys Apr 10 '21 at 18:24
2

I use this function to read the Whole path, it works good

function read_dir($dir)
    {
        $files = array();
        $dir = preg_replace('~\/+~','/',$dir . '/');
        $all  = scandir($dir);
        foreach($all as $path):
            if($path !== '.' && $path !== '..'):
                $path = $dir . '/' . $path;
                $path = preg_replace('~\/+~','/',$path);
                $path = realpath($path);
                if(is_dir($path)):
                    $files = array_merge($files, read_dir($path));
                endif;
                $files[] = preg_replace('~/+~i','/',$path);
            endif;
        endforeach;
        return $files;
}

UPDATE: You Can Use preg_grep over the above function as follow

$files = preg_grep('~\.gitignore\b~i', array_values(read_dir($path)));
2

Just a crazy idea: if you rely on Git to give you the patterns for ignored files why not rely on it to give the list of included/ignored files? Just issue a command like:

  • git ls-files for all tracked files
  • git clean -ndX or git ls-files -i --exclude-from=[Path_To_Your_Global].gitignore for all ignored files

See which Git command gives you the best output and then loop through the path files.

And a word of caution: take all the necessary precaution measures needed when executing external commands!

Sources:

Community
  • 1
  • 1
Max
  • 5,673
  • 4
  • 23
  • 32
  • 1
    The problem is I need to process .gitignore file with PHP without Git environment. – Angolao Nov 22 '13 at 10:20
  • If that's really the case, meaning you cannot issue a shell command, than you're better of with the other solutions. – Max Nov 25 '13 at 09:45
1

entries in a .gitignore are mostly glob patterns. you can read each line of your .gitignore using php's file function, ignore empty lines and lines that start with # and then read the patterns using the php glob function (http://php.net/manual/en/function.glob.php)

thrau
  • 2,311
  • 1
  • 22
  • 30
1

You can get an array of files to ignore from a .gitignore file and check against that. To do that, you would need to read the file and match files using the glob function.

First, get the contents of the file:

$contents = file_get_contents($pathToGitIgnoreFile);
$path = dirname(realpath($pathToGitIgnoreFile));

You can also use the directory of the .gitignore file to match files in the same directory as the gitignore.

Next, we need to split the contents into individual rules. Rules start on their own line in the file. Lines that start with the pound symbol (#) are comments, so we can just use a regular expression to find non-blank lines that aren't comments:

$rules = array();
preg_match_all('/[\\A\\v]([^#\\v]\\V*)[\\z\\v]?/', $contents, $rules);
$rules = $rules[1];

Then all you have to do is iterate through the rules and use glob to create an array of file names to ignore:

$files = array();
foreach ($rules as $rule)
{
    if (strpos($rule, '!') === 0) // negative rule
        $files = array_diff($files, glob($path . DIRECTORY_SEPARATOR . substr($rule, 1)));
    else
        $files = array_merge($files, glob($path . DIRECTORY_SEPARATOR . $rule));
}
$files = array_unique($files);

I didn't test this code, so comment below if it doesn't work for you.

  • Thanks, but it doesn't work correctly. 1. glob doesn't include the subfolder and the result is empty. 2. It's good to load all files by each rule? If we have 10,000 files and 50 rules, the code have to load 10,000 * 50 files times. – Angolao Nov 15 '13 at 03:25
  • @anlai I edited the regex and tested it - it works. Glob does match subdirectories and files. If you want to parse multiple .gitignore files you would need to recursively loop over the directory and find any .gitignore files first. There's no way around having 10,000+ files that I can come up with. –  Nov 22 '13 at 23:03
0

The SPL (Standard PHP Library) contains some iterators for that job. I am limiting the example to filter out all directories or files that start with an "." in their name.

The rules for .gitignore are quite complex, parsing the entries and building a set of rules would go way beyond the scope of an example.

$directory = __DIR__;

$filtered = new RecursiveIteratorIterator(
  new RecursiveCallbackFilterIterator(
    new RecursiveDirectoryIterator($directory),
    function ($fileInfo, $key, $iterator) {
      // only accept entries that do not start with an . 
      return substr($fileInfo->getFilename(), 0, 1) != '.';
    }
  )
);


foreach ($filtered as $fileInfo) {
  echo (string)$fileInfo, "\n";
}
ThW
  • 16,962
  • 2
  • 18
  • 37