how to delete duplicate lines in a text file in unix bash?

Question

I just have a file.txt with multiple lines, I would like to remove duplicate lines without sorting the file. what command can i use in unix bash ?

sample of file.txt

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
orangejuice;orange;juice_apple

sample of output:

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple

I'd like to see this closed as duplicate, too, but I hope there is a better question to link to. — tripleee, Aug 11 '13 at 10:00
[Linux Bash commands to remove duplicates from a CSV file](https://stackoverflow.com/q/25393281/608639). Change the delimiter. — jww, Jul 13 '18 at 09:39

score 37 · Answer 1 · answered Aug 11 '13 at 12:27

37

One way using awk:

awk '!a[$0]++' file.txt

answered Aug 11 '13 at 12:27

Steve

41,445
12
83
96

You can't write this to a file via an alias sourced from bashrc > output.txt it has only one line? – Master James Oct 06 '17 at 10:45
root@server:/tmp# alias RDL="awk '!a[\$0]++' cleanList.txt > cleanList2.txt" bash: !a[\$0]++': event not found root@server:/tmp# alias RDL="awk '\!a[$0]++' cleanList.txt > cleanList2.txt" root@mdserver:/tmp# RDL awk: cmd. line:1: \!a[bash]++ awk: cmd. line:1: ^ backslash not last character on line root@server:/tmp# alias RDL="awk '\\!a[$0]++' cleanList.txt > cleanList2.txt" ??? – Master James Oct 06 '17 at 10:53
Found this cat -n file_name | sort -uk2 | sort -nk1 | cut -f2- at https://stackoverflow.com/questions/11532157/unix-removing-duplicate-lines-without-sorting – Master James Oct 06 '17 at 10:59
better yet the `uniq` command works in an alias even http://man7.org/linux/man-pages/man1/uniq.1.html – Master James Oct 06 '17 at 11:05
@MasterJames: You'll need to single quote that expression, then escape the single quotes like: `alias RDL='awk '\''!a[$0]++'\'' cleanList.txt > cleanList2.txt'`. See: https://stackoverflow.com/a/9899594/751863. Alternatively, just use a function. – Steve Oct 07 '17 at 13:55
Thanks for that. Working! [as well... better] – Master James Oct 21 '17 at 10:17

choroba · Answer 2 · 2013-08-11T10:24:11.020

14

You can use Perl for this:

perl -ne 'print unless $seen{$_}++' file.txt

The -n switch makes Perl process the file line by line. Each line ($_) is stored as a key in a hash named "seen", but since ++ happens after returning the value, the line is printed the first time it is met.

edited Aug 11 '13 at 10:24

answered Aug 11 '13 at 09:48

choroba

200,498
20
180
248

This in an alias when output to a file > output.txt creates an empty file? alias RDL="perl -ne 'print unless $seen{$_}++' cleanList.txt > cleanList2.txt" root@server:/tmp# RDL Can't modify anonymous hash ({}) in postincrement (++) at -e line 1, near "}++" Execution of -e aborted due to compilation errors. root@server:/tmp# – Master James Oct 06 '17 at 10:46
Found this cat -n file_name | sort -uk2 | sort -nk1 | cut -f2- at https://stackoverflow.com/questions/11532157/unix-removing-duplicate-lines-without-sorting – Master James Oct 06 '17 at 10:59
the `uniq` command works in an alias even http://man7.org/linux/man-pages/man1/uniq.1.html – Master James Oct 06 '17 at 11:05
@MasterJames: The OP wanted to process the file "without sorting", which `uniq` can't do. – choroba Oct 06 '17 at 12:19
I see now uniq only removes repeat lines not dups from input. It only skips whwn they'really the same on the next line (aka repeats not dups). This is fine for my situation, so i didn't notice that sort makes dups=repeats which uniq skips. Without sort dups that are non repeating are not removed. Thanks for clarity. – Master James Oct 10 '17 at 04:43

how to delete duplicate lines in a text file in unix bash?

2 Answers2