-1

I have a Tstringlist that contains a very long list of files that i fill in using a search on my disk. The list contains files with different extensions - .docx .xlsx and so on The filling of this list is done by searching for the extensions one at a time and therefore it takes quite some time What I would like to do is to make it so that I could start multiple searches that fills the same TStringList with filenames. I have an idea of that it should be done by some threading, but this is a blank sheet of paper for me.

Any hints or maybe samples that I should study?

The code below is the one I use today

function TFiles.Search(aList: TstringList; aPathname: string; const aFile: string = '*.*'; const aSubdirs: boolean = True): integer;
var
  Rec: TSearchRec;
begin
  Folders.Validate(aPathName, False);
  if FindFirst(aPathname + aFile, faAnyFile - faDirectory, Rec) = 0 then
    try
      repeat
        aList.Add(aPathname + Rec.Name);
      until FindNext(Rec) <> 0;
    finally
      FindClose(Rec);
    end;
  Result := aList.Count;
  if not aSubdirs then Exit;
  if FindFirst(aPathname + '*.*', faDirectory, Rec) = 0 then
    try
      repeat
        if ((Rec.Attr and faDirectory) <> 0)  and (Rec.Name<>'.') and (Rec.Name<>'..') then
          Files.Search(aList, aPathname + Rec.Name, aFile, True);
        until FindNext(Rec) <> 0;
    finally
      FindClose(Rec);
    end;
  Result := aList.Count;
end;
OZ8HP
  • 1,337
  • 4
  • 28
  • 56
  • 3
    I doubt this will give you a performance improvement. You will have several processes traversing the disk at the same time, but looking for different file types. This will result in heavy disk trashing. You would be better off letting FindFirst find *all* files, then store the file names having the required extensions in the TStringList. The disk I/O is the bottleneck here. – Jan Doggen May 05 '14 at 13:27
  • It would be better to search for ALL files, and add proper files to the list. – MBo May 05 '14 at 13:28
  • Another option is to Find all the files in the folder or folders and store them in memory, then check against that, assuming they aren't going to change underneath you. It's was way faster for me when doing something similar, even with the cost of the memory to store them. – Tony Hopkinson May 05 '14 at 13:28
  • Guess two more people have run into the same issue :) – Tony Hopkinson May 05 '14 at 13:29
  • 1
    Also, as a small improvement you can leave the recursion and use e.g. depth-first search with a stack (or string list working as a stack). – TLama May 05 '14 at 13:35
  • Why not search for every extension at once? See [How to search different file types using FindFirst?](http://stackoverflow.com/q/5991040/576719). Sort the results into the different extension lists. If you need the search to be threaded, you would only need one thread. – LU RD May 05 '14 at 13:48

2 Answers2

3

Building on LU RD's suggestion.

Only traverse the disk once
Search for all files at once. This way you only have to traverse the directories once, saving a lot of I/O time.

See: How to search different file types using FindFirst?

procedure FileSearch(const PathName: string; const Extensions: string;
                     var lstFiles: TStringList);
// .....(copy code from above link)

Multithreading the non disk parts
When you've acquired your files, you can search trough all of them at once using threads.

Something like this.

type
  TSearchThread = class(TThread)
  private
    FFilenames: TStringList;
    FExtensionToSearchFor: string;
    FResultList: TStringList;
  protected
    procedure Execute; override;
  public
    constructor Create(AFilelist: TStringlist; Extension: string);
    property Filenames: TStringList read FFilenames;
    property ExtensionToSearchFor: string read FExtensionToSearchFor;
    property ResultList: TStringList read FResultList;
  end;

  TForm1 = class(TForm)
  private
    FFilenames: TStringList;
    FWorkerBees: array of TSearchThread;
    FNumberOfBeesStillWorking: cardinal;
    procedure WorkerBeeTerminate(Sender: TObject);
  public
    procedure LetsWork;
    procedure AllDone;
  end;

implementation  

constructor TSearchThread.Create(AFilelist: TStringList; Extension: string);
const
  WaitABit = true;
begin
  inherited Create(WaitABit);
  FResultList:= TStringList.Create;
  FExtensionToSearchFor:= Extension;
  FFilenames:= AFilelist;
  //Self.FreeOnTerminate:= false;
end;

procedure TSearchThread.Execute;
var
  FilenameI: string;
begin
  for i:= 0 to FFilenames.Count -1 do begin
    FileNameI:= FFilenames[i];
    if (ExtractFileExtension(FilenameI) = FExtensionToSearchFor) then begin
      FResultList.Add(FilenameI);
    end;   
  end; {for i} 
end;

procedure TForm1.LetsWork;
begin
  FileSearch(PathName, Extensions, FFilenames);
  SetLength(FWorkerBees, NumberOfExtensions);
  FNumberOfBeesStillWorking:= NumberOfExtensions;
  for i:= 0 to NumberOfExtensions - 1 do begin
    FWorkerBees[i]:= TSearchThread.Create(FFilenames, GetExtensionI(Extensions,i));
    FWorkerBees[i].OnTerminate:= WorkerBeeTerminate;
    FWorkerBees[i].Start;
  end; {for i}
end;

procedure TForm1.WorkerBeeTerminate(Sender: TObject);
begin
  Dec(FNumberOfWorkerBeesStillWorking);
  if FNumberOfWorkerBeesStillWorking = 0 then AllDone;
end;

procedure TForm1.AllDone;
begin
  //process the ResultLists for all the threads...
  //Free the threads when done

Time your code
But before you go through all this hassle...

Time your code, see: Calculating the speed of routines?

Just write a normal single threaded version and time each component.
Only optimize a section if it occupies a significant % of the running time.

Profiler
A cool tool I like to use for that purpose is: GPProfiler See: http://code.google.com/p/gpprofile2011/downloads/list

It supports Delphi at least up to XE3 and probably beyond.

Community
  • 1
  • 1
Johan
  • 71,222
  • 23
  • 174
  • 298
0

As other mentioned, I think the bottleneck is the disk IO. So I present a solution, that run in two threads. In the first i do the filesearch and in the second the files will filtered. So the search and analyze is at the same time.

BUT: Time your code to find the your bottleneck.

TSearchFilterThread = class(TThread)
  private
    fFileQueue: TStringList;
    fExtensionList: TStringList;
    fCriticalSection: TCriticalSection;
    fResultList: TStringList;
    fNewDataInList: TSimpleEvent;
    function getNextFileToProcess: string;
    function matchFilter(const filename: string): boolean;
protected
    procedure execute; override;
public
    constructor create(searchForExtension: TStringList); reintroduce;
    destructor destroy; override;
    procedure appendFile(const filename: string);
    procedure waitForEnd;
    property Results: TStringlist read fResultList;
end;

procedure TSearchFilterThread.appendFile(const filename: string);
begin
  fCriticalSection.Enter;
  try
    fFileQueue.Add(filename);
    fNewDataInList.SetEvent;
  finally
    fCriticalSection.Leave;
  end;
end;

constructor TSearchFilterThread.create(searchForExtension: TStringList);
begin
  inherited create(true);
  //To protected acces to the TStringList fFileQueue
  fCriticalSection := TCriticalSection.Create;

  fExtensionList := searchForExtension;
  fExtensionList.Sorted := true;
  fExtensionList.CaseSensitive := false;

  fFileQueue := TStringList.Create;

  //Event to notify workerthread, that new data available
  fNewDataInList := TSimpleEvent.Create;
  fNewDataInList.ResetEvent;

  fResultList := TStringList.Create;

  resume;
end;

destructor TSearchFilterThread.destroy;
begin
  terminate;
  fNewDataInList.SetEvent;
  waitFor;

  fResultList.Free;
  fCriticalSection.Free;
  fFileQueue.Free;
  inherited;
end;

function TSearchFilterThread.getNextFileToProcess: string;
begin
  fCriticalSection.Enter;
  try
    if fFileQueue.Count > 0 then begin
      result := fFileQueue[0];
      fFileQueue.Delete(0);
    end
    else
      result := '';
  finally
    fCriticalSection.Leave;
  end;
end;

function TSearchFilterThread.matchFilter(const filename: string): boolean;
var
  extension: string;
begin
  extension := ExtractFileExt(filename);
  result := fExtensionList.IndexOf(extension) > -1;
end;

procedure TSearchFilterThread.execute;
const
  INFINITE: longword = $FFFFFFFF;
var
 fileName: string;
begin
  while true do begin
    fileName := getNextFileToProcess;
    if fileName <> '' then begin
      if matchFilter(filename) then
        fResultList.Add(fileName);
    end
    else if not terminated then begin
      fNewDataInList.WaitFor(INFINITE);
      fNewDataInList.resetEvent;
    end
    else if terminated then
      break;
  end;
end;


procedure TSearchFilterThread.waitForEnd;
begin
  Terminate;
  fNewDataInList.SetEvent;
  waitFor;
end;

The searchmethod that find all files and delegate the filtering to the thred

procedure FileSearch(const pathName: string; filter: TSearchFilterThread);
const
  FileMask = '*.*';
var
  Rec: TSearchRec;
  Path: string;
begin
  Path := IncludeTrailingPathDelimiter(pathName);
  if FindFirst(Path + FileMask, faAnyFile - faDirectory, Rec) = 0 then
    try
      repeat
        filter.appendFile(Path + rec.Name);
      until FindNext(Rec) <> 0;
    finally
      SysUtils.FindClose(Rec);
    end;

  if FindFirst(Path + '*.*', faDirectory, Rec) = 0 then
    try
      repeat
        if ((Rec.Attr and faDirectory) <> 0) and (Rec.Name <> '.') and
          (Rec.Name <> '..') then
          FileSearch(Path + Rec.Name, filter);
      until FindNext(Rec) <> 0;
    finally
      FindClose(Rec);
    end;
end;

And here the starter and resultpresenter:

procedure TForm1.startButtonClick(Sender: TObject);
var
  searchFilter: TSearchFilterThread;
  searchExtensions: TStringList;
  path: string;
begin
  path := 'c:\windows';

  searchExtensions := TStringList.Create;
  searchExtensions.Add('.doc');
  searchExtensions.Add('.docx');
  searchExtensions.Add('.ini');

  searchFilter := TSearchFilterThread.create(searchExtensions);
  try
    FileSearch(path, searchFilter);
    searchFilter.waitForEnd;

    fileMemo.Lines := searchFilter.Results;
  finally
    searchFilter.Free;
    searchExtensions.Free;
  end;
end;

It might be a little oversized, but I want to code a little bit.

Malte
  • 148
  • 1
  • 8
  • I highly doubt this will reduce something. You will most probably decrease a performance with this. You are executing a recursive search in the main thread and before you do so, you start a worker thread into which you enqueue every single file that you find through a lock which has even no spin time for the thread. Do not utilize critical sections that much. And avoid doing such a *lazy* task in a worker thread. [not voting down] – TLama May 05 '14 at 21:23
  • I will have to give all the suggestions some thoughts and see if I can improve my code if not a lot then at least some. But I won't get the time before later this week - in am just leaving for a trip to Holland. But I will be back.... – OZ8HP May 06 '14 at 05:18