Remove (not just unset) multiple strings from an array without knowing their positions

Question

Say I have arrays

a1=(cats,cats.in,catses,dogs,dogs.in,dogses)
a2=(cats.in,dogs.in)

I want to remove everything from a1 that matches the strings in a2 after removing ".in" , in addition to the ones that match completely(including ".in").

So from a1, I want to remove cats, cats.in, dogs, dogs.in, but not catses or dogses.

I think I'll have to do this in 2 steps. I found how to cut the ".in" away:

for elem in "${a2[@]}" ; do 
    var="${elem}"
    len="${#var}"
    pref=${var:0:len-3}
done

^ this gives me "cats" and "dogs"

What command do I need to add to the loop remove each elem from a1?

Possible duplicate of: http://stackoverflow.com/questions/2696055/intersection-of-two-lists-in-bash — user2182349, Jan 25 '16 at 03:43
Just use `unset`. This would leave indices like `0, 2, 3, 4, 6` & remove indices `1, 5` from `a1`. Then just run `a1=("${a1[@]}")`, to set these indices as `0..4`. — anishsane, Jan 25 '16 at 03:59
Also, on another note, the default separator in array, is white space, not comma. — anishsane, Jan 25 '16 at 04:00
I'm unclear on why your question title says "**remove (not just unset)**". What do you think is the difference, when dealing with elements of an array? — ghoti, Jan 25 '16 at 06:45

Rany Albeg Wein · Answer 1 · 2016-01-26T00:00:27.160

A naive approach would be:

#!/bin/bash

# Checkes whether a value is in an array.
# Usage: "$value" "${array[@]}"
inarray () { 
    local n=$1 h
    shift
    for h in "$@";do
        [[ $n = "$h" ]] && return
    done
    return 1
}

a1=(cats cats.in catses dogs dogs.in dogses)
a2=(cats.in dogs.in)
result=()

for i in "${a1[@]}";do
    if ! inarray "$i" "${a2[@]}" && ! inarray "$i" "${a2[@]%.in}"; then
        result+=("$i")
    fi
done

# Checking.
printf '%s\n' "${result[@]}"

If you only want to print the values to stdout, you might instead want to use comm:

comm -23 <(printf '%s\n' "${a1[@]}"|sort -u) <(printf '%s\n' "${a2[@]%.in}" "${a2[@]}"|sort -u)

score 1 · Answer 2 · answered Jan 25 '16 at 06:17

Seems to me that the easiest way to solve this is with nested for loops:

#!/usr/bin/env bash

a1=(cats cats.in catses dogs dogs.in dogses)
a2=(cats.in dogs.in)

for x in "${!a1[@]}"; do                # step through a1 by index
  for y in "${a2[@]}"; do               # step through a2 by content
    if [[ "${a1[x]}" = "$y" || "${a1[x]}" = "${y%.in}" ]]; then
      unset a1[x]
    fi
  done
done

declare -p a1

But depending on your actual data, the following might be better, using two separate for loops instead of nesting.

#!/usr/bin/env bash

a1=(cats cats.in catses dogs dogs.in dogses)
a2=(cats.in dogs.in)

# Flip "a2" array to "b", stripping ".in" as we go...
declare -A b=()
for x in "${!a2[@]}"; do
  b[${a2[x]%.in}]="$x"
done

# Check for the existence of the stripped version of the array content
# as an index of the associative array we created above.
for x in "${!a1[@]}"; do
  [[ -n "${b[${a1[x]%.in}]}" ]] && unset a1[$x] a1[${x%.in}]
done

declare -p a1

The advantage here would be that instead of looping through all of a2 for each item in a1, you just loop once over each array. Down sides might depend on your data. For example, if contents of a2 are very large, you might hit memory limits. Of course, I can't know that from what you included in your question; this solution works with the data you provided.

NOTE: this solution also depends on an associative array, which is a feature introduced to bash in version 4. If you're running an old version of bash, now might be a good time to upgrade. :)

score 1 · Accepted Answer · answered Jan 25 '16 at 18:07

This is the solution I went with:

for elem in "${a2[@]}" ; do 
    var="${elem}"
    len="${#var}"
    pref=${var:0:len-3}

    #set 'cats' and 'dogs' to ' '
    for i in ${!a1[@]} ; do 
            if [ "${a1[$i]}" = "$pref" ] ; then
                a1[$i]=''
            fi

    #set 'cats.in' and 'dogs.in' to ' '
            if [ "${a1[$i]}" = "$var" ] ; then
                a1[$i]=''
            fi
    done
done

Then I created a new array from a1 without the ' ' elements

a1new=( )
for filename in "${a1[@]}" ; do
    if [[ $a1 != '' ]] ; then
        a1new+=("${filename}")
    fi
done

Remove (not just unset) multiple strings from an array without knowing their positions

3 Answers3