1

I want to delete files in the current folder with the following pattern.

0_something.sql.tar

I have a string provided which contains numbers

number_string="0,1,2,3,4"

How can I delete any files not included in the number_string while also keeping to the x_x.sql.tar pattern?

For example, I have these files:

0_something.sql.tar
2_something.sql.tar
4_something.sql.tar
15_something.sql.tar

Based on this logic, and the numbers in the number string - I should only remove 15 because:

  1. It follows the pattern _.sql.tar
  2. It doesnt have a number in the number string
Oliver Kucharzewski
  • 2,156
  • 3
  • 23
  • 43
  • The description of the problem is underspecified. Does each tar filename in the directory consist of a single decimal digit (followed by .tar extension) ? – M. Nejat Aydin Jul 27 '20 at 09:10
  • Thanks for asking @M.NejatAydin - sorry should've made that clearer. I'll update the description – Oliver Kucharzewski Jul 27 '20 at 10:22
  • Still vague. Given `number_string="0,1,2,3,4"` and the files `2_x.sql.tar`, `5_x.sql.tar`, `12_x.sql.tar`, `15_x.sql.tar`, `51_x.sql.tar`, `56_x.sql.tar`, `a_x.sql.tar`, `aa.sql.tar`, `2.sql.tar`, `5.sql.tar`, `12.sql.tar`, `15.sql.tar`, `51.sql.tar`, `56.sql.tar`, what files should be deleted and what files should remain? – M. Nejat Aydin Jul 27 '20 at 11:05
  • So the files beginning with numbers in the number string should remain, and all else should be removed. Thanks for your patience! – Oliver Kucharzewski Jul 27 '20 at 12:30
  • 1
    Then a simple one-liner could do it: `echo [^${number_string//,}]*.tar`. This echoes the files to be removed. Replace the `echo` with the `rm` after making sure it will work as intended. – M. Nejat Aydin Jul 27 '20 at 12:37

6 Answers6

1

This might help you out:

s="0,1,2,3,4"
s=",${s},"
for f in *.sql.tar; do
   n="${f%_*}"
   [ "${n//[0-9]}" ] && continue
   [ "$s" == "${s/,${n},/}" ] && echo rm -- "$f"
done

Remove the echo if this answer pleases you

What this is doing is the following:

  • convert your number_string s into a string which is fully comma-separated and also starts and ends with a comma (s=",0,1,2,3,"). This allows us to search for entries like ,5,
  • loop over all files matched by the glob *.sql.tar
  • n="${f%_*}": Extract the substring before the first underscore `
  • [ "{n//[0-9]}" ] && continue: validate if the substring is an integer, if not, skip the file and move to the next one.
  • substitute the number in the number_string (with commas), if the substring does not change, it implies we should not keep the file
kvantour
  • 20,742
  • 4
  • 38
  • 51
  • Hi @kvantour - i've updated my description for more clarity, but upon testing your solution with slight modifications it didnt completely work. It pulled data out but didnt filter out the items i didnt want. Thanks :) – Oliver Kucharzewski Jul 27 '20 at 22:11
  • @OliverKucharzewski I have updated my solution with the new specs of your question. I hope this works for you now. – kvantour Jul 28 '20 at 07:36
1

$IFS can help here.

( IFS=,; for n in $number_string; do echo rm $n\_something.sql.tar; done; )

The parens run the command in a subshell so the reassignment of IFS is scoped.
Setting it to a comma lets the command parser split the string into discrete numbers for you and loop over them.
If that gives you the right list of commands you want to execute, just take out the echo. :)

UPDATE

OH! I see that now. Sorry, my bad, lol...

Well then, let's try a totally different approach. :)
Extended Globbing is likely what you need.

shopt -s extglob # turn extended globbing on
echo rm !(${number_string//,/\|})_something.sql.tar

That'll show you the command that would be executed. If you're satisfied, take the echo off. :)

This skips the need for a brute-force loop.

Explanation -

Once extglob is on, !(...) means "anything that does NOT match any of these patterns."

${number_string//,/\|} replaces all commas in the string with pipe separators, creating a match pattern for the extended glob.

Thus, !(${number_string//,/\|}) means anything NOT matching one of those patterns; !(${number_string//,/\|})_something.sql.tar then means "anything that starts with something NOT one of these patterns, followed by this string."

I created these:

$: printf "%s\n" *_something.sql.tar
0_something.sql.tar
1_something.sql.tar
2_something.sql.tar
3_something.sql.tar
4_something.sql.tar
5_something.sql.tar
6_something.sql.tar
7_something.sql.tar
8_something.sql.tar
9_something.sql.tar

then after setting extglob and using the above value for $number_string, I get this:

$: echo !(${number_string//,/\|})_something.sql.tar
5_something.sql.tar 6_something.sql.tar 7_something.sql.tar 8_something.sql.tar 9_something.sql.tar

Be careful about quoting, though. You can quote it to see the pattern itself, but then it matches nothing.

$: echo "!(${number_string//,/\|})_something.sql.tar"
!(0|1|2|3|4)_something.sql.tar

if you prefer the loop...

for f in *_something.sql.tar            # iterating over all these
do case ",${f%_something.sql.tar}," in  # for each, with suffix removed
   ",$number_string,") continue ;;      # skip matches
                    *) rm "$f"  ;;      # delete nonmatches
   esac
done
   
Paul Hodges
  • 8,723
  • 1
  • 12
  • 28
  • Hi Paul, thanks for your submission - Close, but actually the other way around - i need to keep these files not remove them :) Hope that makes sense. The other files not in the number string should be included. – Oliver Kucharzewski Jul 27 '20 at 22:03
1
# Get the unmatched numbers from the second stream
# ie. files to be removed
join -v2 -o2.2 <(
        # output sorted numbers on separate lines
        sort <<<${number_string//,/$'\n'}
) <(
        # fins all files named in such way
        # and print filename, tab and path separated by newlines
        find . -name '[0-9]*_something.sql.tar' -printf "%f\t%p\n" |
        # extract numbers from filenames only
        sed 's/\([0-9]*\)[^\t]*/\1/' |
        # sort for join
        sort
) |
# pass the input to xargs
# remove echo to really remove files
xargs -d '\n' echo rm

Tested on repl

KamilCuk
  • 69,546
  • 5
  • 27
  • 60
0

Write a script to do the matching, and remove those names that do not match. For example:

$ rm -rf foo
$ mkdir foo
$ cd foo
$ touch {2,4,6,8}.tar 
$ echo "$number_string" | tr , \\n | sed 's/$/.tar/' > match-list
$ find . -type f -exec sh -c 'echo $1 | grep -f match-list -v -q' _ {} \;  -print
./6
./8
./match-list

Replace -print with -delete to actually unlink the names. Note that this will cause problems since match-list will probably get deleted midway through and no longer exist for future matches, so you'll want to modify it a bit. Perhaps:

find . -type f -not -name match-list -name '*.tar' -exec sh -c 'echo $1 | grep -f match-list -v -q' _ {} \;  -delete

In this case, there's no need to explicitly exclude 'match-list' since it will not match the -name '*.tar' primitive, but is included here for completeness.

William Pursell
  • 174,418
  • 44
  • 247
  • 279
  • Hi William, thanks for your contribution. I added some details to the question as I dont think it was descriptive enough. There may be a few changes necessary - though i appreciate your code so far! – Oliver Kucharzewski Jul 27 '20 at 22:15
0

I have sacked some previous answers, but credit is given and the resulting script is nice

$ ls -l
total 4
-rwxr-xr-x 1 boffi boffi 355 Jul 27 10:58 rm_tars_except
$ cat rm_tars_except 
#!/usr/bin/env bash

dont_rm="$1"
# https://stackoverflow.com/a/10586169/2749397
IFS=',' read -r -a dont_rm_a <<< "$dont_rm"
for tarfile in ?.tar ; do
    digit=$( basename "$tarfile" .tar )
    # https://stackoverflow.com/a/15394738/2749397
    [[ " ${dont_rm_a[@]} " =~ " ${digit} " ]] && \
            echo "# Keep $tarfile" || \
            echo "rm $tarfile"
done
$ touch 1.tar 3.tar 5.tar 7.tar
$ ./rm_tars_except 3,5
rm 1.tar
# Keep 3.tar
# Keep 5.tar
rm 7.tar
$ ./rm_tars_except 3,5 | sh
$ ls -l
total 4
-rw-r--r-- 1 boffi boffi   0 Jul 27 11:00 3.tar
-rw-r--r-- 1 boffi boffi   0 Jul 27 11:00 5.tar
-rwxr-xr-x 1 boffi boffi 355 Jul 27 10:58 rm_tars_except
$ 

If we can remove the restrictions on the "keep info" presented in a comma separated string then the script can be significantly simplified

#!/usr/bin/env bash

for tarfile in ?.tar ; do
    digit=$( basename "$tarfile" .tar )
    # https://stackoverflow.com/a/15394738/2749397
    [[ " ${@} " =~ " ${digit} " ]] && \
            echo "# Keep $tarfile" || \
            echo "rm $tarfile"
done

that, of course, should be called like this ./rm_tars_except 3 5 | sh

gboffi
  • 17,041
  • 5
  • 45
  • 76
0

find . -type f -name '*_something.sql.tar' | grep "<input the series with or | symbol>" | xargs rm -f

example:- find . -type f -name '*_something.sql.tar' | grep "0\|1\|2\|3\|4" | xargs rm -f