How to loop through a directory recursively to delete files with certain extensions

Question

I need to loop through a directory recursively and remove all files with extension .pdf and .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.

My code so far

#/bin/sh

SEARCH_FOLDER="/tmp/*"

for f in $SEARCH_FOLDER
do
    if [ -d "$f" ]
    then
        for ff in $f/*
        do      
            echo "Processing $ff"
        done
    else
        echo "Processing file $f"
    fi
done

I need help to complete the code, since I'm not getting anywhere.

I know it's bad form to execute code without understanding it, but a lot of people come to this site to learn bash scripting. I got here by googling "bash scripting files recursively", and _almost_ ran one of these answers (just to test the recursion) without realizing it would delete files. I know `rm` is a part of OP's code, but it's not actually relevant to the question asked. I think it'd be safer if answers were phrased using a harmless command like `echo`. — Keith, Apr 05 '16 at 03:26
Similar question here: http://stackoverflow.com/questions/41799938/how-to-recursively-traverse-a-directory-tree-and-find-only-files — codeforester, Jan 23 '17 at 06:22
@Keith had similar experience, completely agree and changed the title — 463035818_is_not_a_number, Jan 24 '17 at 15:48

score 237 · Answer 1 · edited Jul 13 '17 at 15:49

237

As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.

for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done

As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).

IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f

If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:

find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm

(The escaped brackets are required here to have the -print0 apply to both or clauses.)

GNU and *BSD find also has a -delete action, which would look like this:

find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete

edited Jul 13 '17 at 15:49

Gilles 'SO- stop being evil'

92,660
35
189
229

answered Mar 09 '11 at 15:21

James Scriven

6,719
1
28
35

27

This does not work as expected if there is a space in the file name (the for loop splits the results of find on whitespace). – trev Apr 28 '13 at 12:54
3

How do you avaoid splitting on whitespace? I'm trying a similar thing and I have a lot of directories with whitespaces that screw up this loop. – Christian Aug 08 '13 at 13:07
3

because it's a very helpful answer? – zenperttu Aug 02 '14 at 12:57
1

@Christian Fix the whitespace splitting by using quotes like this: "$(find...)". I've edited James' answer to show. – Matthew Nov 06 '14 at 19:22
2

@Matthew your edit didn't fix anything at all: it actually made the command _only work if there's a unique found file_. At least this version _works_ if there are no spaces, tabs, etc. in filenames. I rolled back to the old version. Noting sensible can really fix a `for f in $(find ...)`. **Just don't use this method.** – gniourf_gniourf Nov 06 '14 at 20:14
@gniourf_gniourf I tested using echo and not rm, which hid the '\n' characters so it looked like it worked. Thanks for checking and fixing. – Matthew Nov 06 '14 at 20:39
1

@DrewDormann my testing also shows that "$(find...)" makes things worse. I've undone your edit, along with making a long-overdue update of my own. – James Scriven Apr 03 '15 at 22:10
`find` has a `-delete` flag. Why aren't we using it? – Zak Feb 20 '19 at 22:39

score 159 · Accepted Answer · answered Jan 09 '11 at 11:33

159

find is just made for that.

find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm

answered Jan 09 '11 at 11:33

mouviciel

62,742
10
106
135

20

Or find's `-delete` option. – Matthew Flaschen Jan 09 '11 at 11:45
31

One should always use `find ... -print0 | xargs -0 ...`, not raw find | xargs to avoid problems with filenames containing newlines. – Grumbel Oct 22 '11 at 15:51
8

Using `xargs` with no options is almost always bad advice and this is no exception. Use `find … -exec` instead. – Gilles 'SO- stop being evil' Jul 13 '17 at 15:54
@Gilles'SO-stopbeingevil': Why is that bad advice? – Carl Winbäck Jan 17 '21 at 10:23
1

@CarlWinbäck Because the syntax of the input to `xargs` is not the syntax that `find` (or any other common command) prints. `xargs` expects a particular kind of quote-delimited input. – Gilles 'SO- stop being evil' Jan 17 '21 at 15:33
Can I `sed` on the found files? I think not because I need a file from `find`.. – Timo May 16 '21 at 19:36

score 78 · Answer 3 · edited Jul 13 '17 at 15:51

78

Without find:

for f in /tmp/* tmp/**/* ; do
  ...
done;

/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar). So for the question the code should look like this:

shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
  rm "$f"
done

Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar instead of shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.

edited Jul 13 '17 at 15:51

Gilles 'SO- stop being evil'

92,660
35
189
229

answered Feb 26 '13 at 11:54

Tomek

921
6
6

1

This method worked for me, even with filenames containing spaces on OSX – ideasasylum Feb 28 '15 at 10:21
2

Worth noting that globstar is only available in Bash 4.0 or newer.. which is not the default version on many machines. – Troy Howard Jan 08 '16 at 22:02
1

I dont think you need to specify the first argument. (At least as of today,) `for f in /tmp/**` will be enough. Includes the files from /tmp dir. – phil294 Jun 13 '17 at 14:25
1

Wouldn't it be better like this ? `for f in /tmp/*.{pdf,doc} tmp/**/*.{,pdf,doc} ; do` – Ice-Blaze Sep 10 '17 at 07:52
Also, `shopt -s dotglob` if hidden files should be found; and if you haven an empty directory with one of the extensions, your command would delete that, too - maybe test if `f` is a file, `[[ -f $f ]] && rm "$f"`? – Benjamin W. Aug 02 '18 at 13:49
1

`**` is a nice extension but not portable to POSIX `sh`. (This question is tagged [tag:bash] but it would be nice to point out that unlike several of the solutions here, this really is Bash-only. Or, well, it works in several other extended shells, too.) – tripleee Aug 03 '18 at 04:57
misses hidden folders – mmm Apr 23 '20 at 14:38

falstro · Answer 4 · 2015-11-13T14:59:26.980

30

If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).

recursiverm() {
  for d in *; do
    if [ -d "$d" ]; then
      (cd -- "$d" && recursiverm)
    fi
    rm -f *.pdf
    rm -f *.doc
  done
}

(cd /tmp; recursiverm)

That said, find is probably a better choice as has already been suggested.

edited Nov 13 '15 at 14:59

answered Jan 09 '11 at 11:35

falstro

31,759
8
68
85

user218867 · Answer 5 · 2015-12-06T03:52:07.143

Here is an example using shell (bash):

#!/bin/bash

# loop & print a folder recusively,
print_folder_recurse() {
    for i in "$1"/*;do
        if [ -d "$i" ];then
            echo "dir: $i"
            print_folder_recurse "$i"
        elif [ -f "$i" ]; then
            echo "file: $i"
        fi
    done
}


# try get path from param
path=""
if [ -d "$1" ]; then
    path=$1;
else
    path="/tmp"
fi

echo "base path: $path"
print_folder_recurse $path

score 15 · Answer 6 · edited Jul 13 '17 at 15:57

15

This doesn't answer your question directly, but you can solve your problem with a one-liner:

find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +

Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:

find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete

edited Jul 13 '17 at 15:57

Gilles 'SO- stop being evil'

92,660
35
189
229

answered Jan 09 '11 at 11:32

Oliver Charlesworth

252,669
29
530
650

TJR · Answer 7 · 2017-06-13T20:39:51.120

8

This method handles spaces well.

files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
  echo "$file"
done

Edit, fixes off-by-one

function count() {
    files="$(find -L "$1" -type f)";
    if [[ "$files" == "" ]]; then
        echo "No files";
        return 0;
    fi
    file_count=$(echo "$files" | wc -l)
    echo "Count: $file_count"
    echo "$files" | while read file; do
        echo "$file"
    done
}

edited Jun 13 '17 at 20:39

answered Nov 09 '12 at 04:09

TJR

3,209
7
35
40

I think "-n" flag after echo not needed. Just test it yourself: with "-n" your script gives wrong number of files. For exactly one file in directory it outputs "Count: 0" – Lopa Jun 12 '17 at 18:35
1

This doesn't work with all file names: it fails with spaces at the end of the name, with file names containing newlines and with some file names containing backslashes. These defects could be fixed but the whole approach is needlessly complex so it isn't worth bothering. – Gilles 'SO- stop being evil' Jul 13 '17 at 15:53

score 6 · Answer 8 · edited Jul 13 '17 at 15:54

For bash (since version 4.0):

shopt -s globstar nullglob dotglob
echo **/*".ext"

That's all.
The trailing extension ".ext" there to select files (or dirs) with that extension.

Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).

Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.

score 1 · Answer 9 · answered Oct 08 '16 at 18:09

The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else block.

function check {
        for file in $1/*      
        do
        if [ -d "$file" ]
        then
                check $file                          
        else
               ##check for the file
               if [ $(head -c 4 "$file") = "%PDF" ]; then
                         rm -r $file
               fi
        fi
        done     
}
domain=/home/ubuntu
check $domain

score 1 · Answer 10 · answered Feb 20 '19 at 22:37

1

There is no reason to pipe the output of find into another utility. find has a -delete flag built into it.

find /tmp -name '*.pdf' -or -name '*.doc' -delete

answered Feb 20 '19 at 22:37

Zak

10,506
15
52
90

score 1 · Answer 11 · answered Feb 05 '20 at 01:40

1

This is the simplest way I know to do this: rm **/@(*.doc|*.pdf)

** makes this work recursively

@(*.doc|*.pdf) looks for a file ending in pdf OR doc

Easy to safely test by replacing rm with ls

answered Feb 05 '20 at 01:40

ecotechie

61
6

TrevTheDev · Answer 12 · 2020-04-13T07:20:06.650

The other answers provided will not include files or directories that start with a . the following worked for me:

#/bin/sh
getAll()
{
  local fl1="$1"/*;
  local fl2="$1"/.[!.]*; 
  local fl3="$1"/..?*;
  for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
    if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then 
      stat --printf="%F\0%n\0\n" -- "$inpath";
      if [ -d "$inpath" ]; then
        getAll "$inpath"
      #elif [ -f $inpath ]; then
      fi;
    fi;
  done;
}

score -1 · Answer 13 · edited Feb 20 '13 at 09:45

-1

Just do

find . -name '*.pdf'|xargs rm

edited Feb 20 '13 at 09:45

Veger

34,172
10
101
111

answered Jan 09 '11 at 11:32

Navi

7,648
4
32
31

5

No, don't do this. This breaks if you have filenames with spaces or other funny symbols. – gniourf_gniourf Nov 06 '14 at 20:20

S.K. Venkat · Answer 14 · 2014-10-07T12:03:22.083

-2

The following will loop through the given directory recursively and list all the contents :

for d in /home/ubuntu/*; do echo "listing contents of dir: $d"; ls -l $d/; done

edited Oct 07 '14 at 12:03

answered Nov 24 '13 at 19:02

S.K. Venkat

1,467
1
23
31

No, this function does not traverse anything recursively. It only lists the content of the subdirectories. It's just fluff around `ls -l /home/ubuntu/*/`, so it's pretty useless. – Gilles 'SO- stop being evil' Jul 13 '17 at 15:56

score -2 · Answer 15 · answered Jan 27 '20 at 13:16

-2

If you can change the shell used to run the command, you can use ZSH to do the job.

#!/usr/bin/zsh

for file in /tmp/**/*
do
    echo $file
done

This will recursively loop through all files/folders.

answered Jan 27 '20 at 13:16

Amin NAIRI

1,521
14
16

How to loop through a directory recursively to delete files with certain extensions

15 Answers15

Linked

Related