0

I found a PowerShell script on TechNet to help locate duplicate files in folders. However, when I run it, I am getting an error on what appears to be every folder\file. Not sure what switch is supposed to be used in this.

$Path = '\\servername\Share\Folders' #define path to folders to find duplicate files
$Files=gci -File -Recurse -path $Path | Select-Object -property FullName,Length
$Count=1
$TotalFiles=$Files.Count
$MatchedSourceFiles=@()
ForEach ($SourceFile in $Files)
{
  Write-Progress -Activity "Processing Files" -status "Processing File $Count / $TotalFiles" -PercentComplete ($Count / $TotalFiles * 100)
  $MatchingFiles=@()
  $MatchingFiles=$Files |Where-Object {$_.Length -eq $SourceFile.Length}
  Foreach ($TargetFile in $MatchingFiles)
  {
    if (($SourceFile.FullName -ne $TargetFile.FullName) -and !(($MatchedSourceFiles |
      Select-Object -ExpandProperty File) -contains $TargetFile.FullName))
    {
      Write-Verbose "Matching $($SourceFile.FullName) and $($TargetFile.FullName)"
      Write-Verbose "File sizes match."
      if ((fc.exe /A $SourceFile.FullName $TargetFile.FullName) -contains "FC: no differences encountered")
      {
        Write-Verbose "Match found."
        $MatchingFiles+=$TargetFile.FullName
      }
    }
  }
  if ($MatchingFiles.Count -gt 0)
  {
    $NewObject=[pscustomobject][ordered]@{
      File=$SourceFile.FullName
      MatchingFiles=$MatchingFiles
    }
    $MatchedSourceFiles+=$NewObject
  }
  $Count+=1
}
$MatchedSourceFiles

Errors

FC: Insufficient number of file specifications fc.exe : FC: Invalid Switch At line:18 char:12

gci : Could not find a part of the path At line:2 char:8

fc.exe : FC: Invalid Switch At line:18 char:12

1 Answers1

0

The script you provided is very inefficient and provides false positives in my tests. It's inefficient because it compares every file twice (Source->Target and Target->Source) and because it iterates through all files regardless of size. Here's a quicker version that gathers the files into groups of similarly sized files and only executes FC.EXE once per pair of files:

$Path = 'C:\Temp'     
$SameSizeFiles  = gci -Path $Path -File -Recurse | Select FullName, Length | Group-Object Length | ? {$_.Count -gt 1} #the list of files with same size
$MatchingFiles=@()
$GroupNdx=1
Foreach($SizeGroup in ($SameSizeFiles | Select Group)){
    For($FromNdx = 0; $FromNdx -lt $SizeGroup.Group.Count - 1; $FromNdx++){
        For($ToNdx = $FromNdx + 1; $ToNdx -lt $SizeGroup.Group.Count; $ToNdx++){
            If( (fc.exe /A $SizeGroup.Group[$FromNdx].FullName $SizeGroup.Group[$ToNdx].FullName) -contains "FC: no differences encountered"){
                $MatchingFiles += [pscustomobject]@{File=$SizeGroup.Group[$FromNdx].FullName; Match = $SizeGroup.Group[$ToNdx].FullName }
            }
        }
    }
    Write-Progress -Activity "Finding Duplicates" -status  "Processing group $GroupNdx of $($SameSizeFiles.Count)" -PercentComplete ($GroupNdx / $SameSizeFiles.Count * 100)
    $GroupNdx += 1
}
$MatchingFiles

Efficiency will be even more important if you're running it over the network. You may find it quicker to execute the script on the server itself, rather than from a share. There is some discussion here about the fastest way to compare files in .Net.

Rich Moss
  • 1,810
  • 1
  • 9
  • 15