102

I need to get all those files under D:\dic and loop over them to further process individually.

Does MATLAB support this kind of operations?

It can be done in other scripts like PHP,Python...

gnovice
  • 123,396
  • 14
  • 248
  • 352
Gtker
  • 2,119
  • 7
  • 28
  • 34

8 Answers8

130

Update: Given that this post is quite old, and I've modified this utility a lot for my own use during that time, I thought I should post a new version. My newest code can be found on The MathWorks File Exchange: dirPlus.m. You can also get the source from GitHub.

I made a number of improvements. It now gives you options to prepend the full path or return just the file name (incorporated from Doresoom and Oz Radiano) and apply a regular expression pattern to the file names (incorporated from Peter D). In addition, I added the ability to apply a validation function to each file, allowing you to select them based on criteria other than just their names (i.e. file size, content, creation date, etc.).


NOTE: In newer versions of MATLAB (R2016b and later), the dir function has recursive search capabilities! So you can do this to get a list of all *.m files in all subfolders of the current folder:

dirData = dir('**/*.m');

Old code: (for posterity)

Here's a function that searches recursively through all subdirectories of a given directory, collecting a list of all file names it finds:

function fileList = getAllFiles(dirName)

  dirData = dir(dirName);      %# Get the data for the current directory
  dirIndex = [dirData.isdir];  %# Find the index for directories
  fileList = {dirData(~dirIndex).name}';  %'# Get a list of the files
  if ~isempty(fileList)
    fileList = cellfun(@(x) fullfile(dirName,x),...  %# Prepend path to files
                       fileList,'UniformOutput',false);
  end
  subDirs = {dirData(dirIndex).name};  %# Get a list of the subdirectories
  validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
                                               %#   that are not '.' or '..'
  for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dirName,subDirs{iDir});    %# Get the subdirectory path
    fileList = [fileList; getAllFiles(nextDir)];  %# Recursively call getAllFiles
  end

end

After saving the above function somewhere on your MATLAB path, you can call it in the following way:

fileList = getAllFiles('D:\dic');
Community
  • 1
  • 1
gnovice
  • 123,396
  • 14
  • 248
  • 352
  • How to make it return the full path instead of only file names? – Gtker Apr 16 '10 at 16:36
  • 3
    +1 - Great solution. I don't know if it's necessary, but if you insert the line: fileList = cellfun(@(x) strcat([dirName,'\'],x),fileList,'UniformOutput',0); into your solution between the first fileList definition and the subDirs definition, it will return the full path and filename for each file. – Doresoom Apr 16 '10 at 16:40
  • 2
    @Doresoom: Good suggestion, although I went with using FULLFILE instead, since it handles the choice of file separator for you (which is different on UNIX and Windows). Also, you could just do `fileList = strcat(dirName,filesep,fileList);` instead of using CELLFUN, although you can end up with extra unnecessary file separators that way, which FULLFILE also takes care of for you. – gnovice Apr 16 '10 at 16:57
  • @gnovice - Quick question: will dir always return the first two entries as . and .. ? That should make sorting them out much easier, but I was wary of just writing (3:end) instead of comparing the names. – Doresoom Apr 16 '10 at 16:57
  • Oh,what's `@(x) fullfile(dirName,x)`?Anonymous function?Why it doesn't have return values like ordinary matlab functions? – Gtker Apr 16 '10 at 17:02
  • @Doresoom: I'm pretty sure `.` and `..` are always the first two entries. I've never seen a case where they aren't (other than, of course, searching for files or directories matching a specific name format). – gnovice Apr 16 '10 at 17:03
  • @Runner: Yes, that's an anonymous function. The return value is just whatever value is returned from the expression it contains, which in this case is the output from FULLFILE. – gnovice Apr 16 '10 at 17:05
  • Is it necessary here `'UniformOutput',false` to make it work?I've read `help cellfun` but still don't see why it's necessary. – Gtker Apr 16 '10 at 17:14
  • @Runner: The arguments `'UniformOutput',false` force CELLFUN to output the results as a cell array, where each cell in the output contains the results of applying the anonymous function to the corresponding cell of the input. This is necessary when the results of processing each input cell can't be easily concatenated into another type of array (char, numeric, etc.). – gnovice Apr 16 '10 at 17:19
  • 2
    @gnovice, @Doreseoom - According to http://www.mathworks.com/access/helpdesk/help/techdoc/ref/dir.html, the order that 'dir' returns is OS dependent. I'm not sure what happens if, for instance, you set the DOS DIRCMD variable to something that changes the order. Octave handles it ok (. and .. are still first) but I don't have MATLAB to test. – mtrw Apr 17 '10 at 00:36
  • @mtrw: Since it's sounding like there's a non-zero probability that the sort order of directories returned by DIR may vary, I modified the code to work with an arbitrary positioning of `'.'` and `'..'` in the directory list. – gnovice Apr 17 '10 at 01:58
  • 2
    @gnovice: This is beyond the OP's question, but I found it useful to build in regular expressions into the function. `if ~isempty(fileList) fileList = cellfun(@(x) fullfile(dirName,x),... %# Prepend path to files fileList,'UniformOutput',false); matchstart = regexp(fileList, pattern); fileList = fileList(~cellfun(@isempty, matchstart)); end` and change the function signature to `getAllFiles(dirName, pattern)` (also in the 2nd to last line) – Peter D Jul 23 '12 at 00:42
  • 1
    Great answer, thanks! I have elaborated the code to support 2 additional parameters - http://stackoverflow.com/a/26449095/69555 – Oz Radiano Oct 19 '14 at 09:20
  • Great fcn but it can be faster. If you replace `fullfile(dirName,x)` with `[dirName filesep x]` you'll get a speedup. – spreus Feb 04 '15 at 14:07
25

You're looking for dir to return the directory contents.

To loop over the results, you can simply do the following:

dirlist = dir('.');
for i = 1:length(dirlist)
    dirlist(i)
end

This should give you output in the following format, e.g.:

name: 'my_file'
date: '01-Jan-2010 12:00:00'
bytes: 56
isdir: 0
datenum: []
Martin Dinov
  • 8,190
  • 2
  • 26
  • 37
James B
  • 7,512
  • 4
  • 30
  • 40
  • Can you make it search recursively including files in subdirectories but excluding directory itself? – Gtker Apr 16 '10 at 13:52
  • Not off the top of my head, no (I no longer have regular access to Matlab), but this may help you: http://www.mathworks.com/matlabcentral/fileexchange/19550-recursive-directory-listing – James B Apr 16 '10 at 13:55
  • 2
    How to exclude `.` and `..` ? – Gtker Apr 16 '10 at 14:44
  • 5
    @Runner: to exclude . and .., remove the first two entries in the output of dir. Or, in case you're looking for a specific file type, run `dir('*.ext')`, which automatically excludes directories (unless they end in .ext, of course) – Jonas Apr 16 '10 at 15:44
14

I used the code mentioned in this great answer and expanded it to support 2 additional parameters which I needed in my case. The parameters are file extensions to filter on and a flag indicating whether to concatenate the full path to the name of the file or not.

I hope it is clear enough and someone will finds it beneficial.

function fileList = getAllFiles(dirName, fileExtension, appendFullPath)

  dirData = dir([dirName '/' fileExtension]);      %# Get the data for the current directory
  dirWithSubFolders = dir(dirName);
  dirIndex = [dirWithSubFolders.isdir];  %# Find the index for directories
  fileList = {dirData.name}';  %'# Get a list of the files
  if ~isempty(fileList)
    if appendFullPath
      fileList = cellfun(@(x) fullfile(dirName,x),...  %# Prepend path to files
                       fileList,'UniformOutput',false);
    end
  end
  subDirs = {dirWithSubFolders(dirIndex).name};  %# Get a list of the subdirectories
  validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
                                               %#   that are not '.' or '..'
  for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dirName,subDirs{iDir});    %# Get the subdirectory path
    fileList = [fileList; getAllFiles(nextDir, fileExtension, appendFullPath)];  %# Recursively call getAllFiles
  end

end

Example for running the code:

fileList = getAllFiles(dirName, '*.xml', 0); %#0 is false obviously
Community
  • 1
  • 1
Oz Radiano
  • 780
  • 2
  • 11
  • 29
8

You can use regexp or strcmp to eliminate . and .. Or you could use the isdir field if you only want files in the directory, not folders.

list=dir(pwd);  %get info of files/folders in current directory
isfile=~[list.isdir]; %determine index of files vs folders
filenames={list(isfile).name}; %create cell array of file names

or combine the last two lines:

filenames={list(~[list.isdir]).name};

For a list of folders in the directory excluding . and ..

dirnames={list([list.isdir]).name};
dirnames=dirnames(~(strcmp('.',dirnames)|strcmp('..',dirnames)));

From this point, you should be able to throw the code in a nested for loop, and continue searching each subfolder until your dirnames returns an empty cell for each subdirectory.

Doresoom
  • 7,340
  • 14
  • 43
  • 60
  • @Runner: It does if you use some for and while loops...but I'm to lazy to implement that right now. – Doresoom Apr 16 '10 at 15:51
  • +1 even though it doesn't exactly answer the question, it does provide a way to cull out the directories quickly. – jhfrontz Mar 08 '12 at 19:14
7

This answer does not directly answer the question but may be a good solution outside of the box.

I upvoted gnovice's solution, but want to offer another solution: Use the system dependent command of your operating system:

tic
asdfList = getAllFiles('../TIMIT_FULL/train');
toc
% Elapsed time is 19.066170 seconds.

tic
[status,cmdout] = system('find ../TIMIT_FULL/train/ -iname "*.wav"');
C = strsplit(strtrim(cmdout));
toc
% Elapsed time is 0.603163 seconds.

Positive:

  • Very fast (in my case for a database of 18000 files on linux).
  • You can use well tested solutions.
  • You do not need to learn or reinvent a new syntax to select i.e. *.wav files.

Negative:

  • You are not system independent.
  • You rely on a single string which may be hard to parse.
Lukas
  • 1,603
  • 2
  • 18
  • 28
3

I don't know a single-function method for this, but you can use genpath to recurse a list of subdirectories only. This list is returned as a semicolon-delimited string of directories, so you'll have to separate it using strread, i.e.

dirlist = strread(genpath('/path/of/directory'),'%s','delimiter',';')

If you don't want to include the given directory, remove the first entry of dirlist, i.e. dirlist(1)=[]; since it is always the first entry.

Then get the list of files in each directory with a looped dir.

filenamelist=[];
for d=1:length(dirlist)
    % keep only filenames
    filelist=dir(dirlist{d});
    filelist={filelist.name};

    % remove '.' and '..' entries
    filelist([strmatch('.',filelist,'exact');strmatch('..',filelist,'exact'))=[];
    % or to ignore all hidden files, use filelist(strmatch('.',filelist))=[];

    % prepend directory name to each filename entry, separated by filesep*
    for f=1:length(filelist)
        filelist{f}=[dirlist{d} filesep filelist{f}];
    end

    filenamelist=[filenamelist filelist];
end

filesep returns the directory separator for the platform on which MATLAB is running.

This gives you a list of filenames with full paths in the cell array filenamelist. Not the neatest solution, I know.

JS Ng
  • 726
  • 5
  • 7
  • For performance reason I don't want to `genpath`,it essentially searches twice. – Gtker Apr 16 '10 at 15:19
  • 2
    One drawback to using GENPATH is that it will only include subdirectories that are allowed on the MATLAB path. For example, if you have directories named `private`, they will not be included. – gnovice Apr 16 '10 at 16:18
1

This is a handy function for getting filenames, with the specified format (usually .mat) in a root folder!

    function filenames = getFilenames(rootDir, format)
        % Get filenames with specified `format` in given `foler` 
        %
        % Parameters
        % ----------
        % - rootDir: char vector
        %   Target folder
        % - format: char vector = 'mat'
        %   File foramt

        % default values
        if ~exist('format', 'var')
            format = 'mat';
        end

        format = ['*.', format];
        filenames = dir(fullfile(rootDir, format));
        filenames = arrayfun(...
            @(x) fullfile(x.folder, x.name), ...
            filenames, ...
            'UniformOutput', false ...
        );
    end

In your case, you can use the following snippet :)

filenames = getFilenames('D:/dic/**');
for i = 1:numel(filenames)
    filename = filenames{i};
    % do your job!
end
Yas
  • 3,443
  • 1
  • 29
  • 22
0

With little modification but almost similar approach to get the full file path of each sub folder

dataFolderPath = 'UCR_TS_Archive_2015/';

dirData = dir(dataFolderPath);      %# Get the data for the current directory
dirIndex = [dirData.isdir];  %# Find the index for directories
fileList = {dirData(~dirIndex).name}';  %'# Get a list of the files
if ~isempty(fileList)
    fileList = cellfun(@(x) fullfile(dataFolderPath,x),...  %# Prepend path to files
        fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name};  %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'});  %# Find index of subdirectories
%#   that are not '.' or '..'
for iDir = find(validIndex)                  %# Loop over valid subdirectories
    nextDir = fullfile(dataFolderPath,subDirs{iDir});    %# Get the subdirectory path
    getAllFiles = dir(nextDir);
    for k = 1:1:size(getAllFiles,1)
        validFileIndex = ~ismember(getAllFiles(k,1).name,{'.','..'});
        if(validFileIndex)
            filePathComplete = fullfile(nextDir,getAllFiles(k,1).name);
            fprintf('The Complete File Path: %s\n', filePathComplete);
        end
    end
end  
Spandan
  • 433
  • 1
  • 6
  • 12