174

I'm trying to write a simple script that will list the contents found in two lists. To simplify, let's use ls as an example. Imagine "one" and "two" are directories.

one=`ls one`
two=`ls two`
intersection $one $two

I'm still quite green in bash, so feel free to correct how I am doing this. I just need some command that will print out all files in "one" and "two". They must exist in both. You might call this the "intersection" between "one" and "two".

User1
  • 35,366
  • 60
  • 170
  • 251

6 Answers6

306
comm -12  <(ls 1) <(ls 2)
ghostdog74
  • 286,686
  • 52
  • 238
  • 332
  • 37
    Can't believe I had no knowledge of `comm` until today. This just made my whole week :) – Darragh Enright Aug 19 '14 at 17:49
  • 26
    `comm` requires the inputs to be sorted. In this case, `ls` automatically sorts its output, but other uses may need to do this: `comm -12 – Alexander Bird Jan 15 '15 at 21:11
  • 12
    DO NOT USE ls' output for anything. ls is a tool for interactively looking at directory metadata. Any attempts at parsing ls' output with code are broken. Globs are much more simple AND correct: ''for file in *.txt''. Read http://mywiki.wooledge.org/ParsingLs – Rany Albeg Wein Jan 25 '16 at 03:49
  • 2
    I just used this in an effort to find usages of a `public` method `error()` provided by a trait, in combination with `git grep`, and it was awesome! I ran `$ comm -12 error(" -- "*.php") – localheinz Apr 07 '17 at 15:45
  • 3
    This is hilarious. I was trying to do some crazy stuff with awk. – Rolf May 08 '17 at 23:36
64

Solution with comm

comm is great but indeed need to work with sorted list. And fortunately here we use ls which from ls Bash man page

Sort entries alphabetically if none of -cftuSUX nor --sort.

comm -12  <(ls one) <(ls two)

Alternative with sort

Intersection of two lists:

sort <(ls one) <(ls two) | uniq -d

symmetric difference of two lists:

sort <(ls one) <(ls two) | uniq -u

Bonus

Play with it ;)

cd $(mktemp -d) && mkdir {one,two} && touch {one,two}/file_{1,2}{0..9} && touch two/file_3{0..9}
Jean-Christophe Meillaud
  • 1,659
  • 1
  • 19
  • 27
30

Use the comm command:

ls one | sort > /tmp/one_list
ls two | sort > /tmp/two_list
comm -12 /tmp/one_list /tmp/two_list

"sort" is not really needed but I always include it before using "comm" just in case.

dplante
  • 2,387
  • 3
  • 20
  • 26
DVK
  • 119,765
  • 29
  • 201
  • 317
3

A less efficient (than comm) alternative:

cat <(ls 1 | sort -u) <(ls 2 | sort -u) | uniq -d
Benubird
  • 15,843
  • 24
  • 83
  • 128
  • 1
    If you are using Debian's /bin/dash or some other non-Bash shell in your scripts, you can chain commands' output using parentheses: `(ls 1; ls 2) | sort -u | uniq -d`. – nitrogen Oct 08 '14 at 20:19
  • 1
    @MikaëlMayer You should flag the name of the person you are replying to, otherwise it is assumed you mean me. – Benubird Feb 23 '15 at 08:34
  • 1
    @nitrogen MikaëlMayer is correct - chainging `sort -u | uniq -d` does nothing, because the sort has removed the duplicates before uniq starts to look for them. I think you have not understood what my command is doing. – Benubird Feb 23 '15 at 08:36
  • @Benubird I was not able to get your command `cat – nitrogen Feb 24 '15 at 09:21
  • @nitrogen The reason why I'm using cat, is because I want this to be a generalizable solution, so that you can replace `ls` with something else, e.g. `find`. Your solution does not allow this, because if one of the commands returns two lines the same, it picks it up as a duplicate. Mine works even if the user wants to do `ls 1/*` and compare all files across subdirectories. Otherwise, yes, it works as well. It's possible mine is bash-specific. – Benubird Feb 24 '15 at 09:50
  • If anyone is interested you can try my version of "comm" which I called "common". It does not need sorting and supports "-123" switches just like "comm". https://github.com/toni-rmc/common – toni rmc May 29 '17 at 15:31
2

Join is another good option depending on the input and desired output

join -j1 -a1 <(ls 1) <(ls 2)
frogstarr78
  • 840
  • 1
  • 7
  • 11
-1

There is another Stackoverflow question "Array intersection in bash," which is marked as a duplicate of this. It's not quite the same, in my opinion, as that question talks about comparing two bash arrays, while this question focuses on bash files. A one-line answer to the other question, which is now closed, is as follows:

# List1=( 0 1 2 3 4   6 7 8 9 10 11 12)
# List2=(   1 2 3   5 6   8 9    11 )
# List3=($(comm -12 <(echo ${List1[*]}| tr " " "\n"| sort) <(echo ${List2[*]} | tr " " "\n"| sort)| sort -g))
# echo ${List3[*]}
1 2 3 6 8 9 11

The comm utility does an alphanumeric sort, whereas the "Array intersection in bash" answers use numbers; hence the "sort" and "sort -g" usage.