How to delete duplicate lines in a file...AWK, SED, UNIQ not working on my file

Question

I find many ways to do this, AWK, SED, UNIQ, but none of them are working on my file.

I want to delete duplicate lines. Here is an example of part of my file:

KTBX
KFSO
KCLK
KTBX
KFSO
KCLK
PAJZ
PAJZ

NOTE: I had to manually add line feeds when I cut and pasted from the file...for some reason it was putting all the variables on one line. Makes me think that my 44,000 line text file actually has only "1" line? Is there a way to modify it so I can delete dups?

If your file does not have more than one line, it's going to be tough for us to recommend a way to delete duplicate lines. — erewok, Sep 26 '13 at 21:30
donnot know if you care about the order, if not. simply 'sort your.file | uniq' should do it. — Dyno Fu, Sep 26 '13 at 21:33
That worked...thanks! Not sure why those others think there are no lines. — Corepuncher, Sep 26 '13 at 21:34
@Kent yes, I tried that one. I typed that, then > newfile , and the newfile was same as old with duplicate entries. — Corepuncher, Sep 26 '13 at 21:50
You don't need `sort file | uniq` when `sort -u file` works just as well but clearly you have some issues with your input file format so how can you KNOW that what you're getting spit out of your command is what you want? What does `wc -l file` tell you? How about `head -10 file | cat -v`? — Ed Morton, Sep 27 '13 at 01:54

score 4 · Answer 1 · edited May 23 '17 at 12:20

4

You can see all non-printed characters with this command:

od -c oldfile

If all your records are on one line, you can use sed to replace a whitespace (space, tab, newline) with a linebreak:

sed -e 's/\s\+/\n/g' oldfile > oldfile.1

Once you have multiple lines, this awk one-liner:

awk '!x[$0]++' oldfile.1 > newfile

my outfile:

KTBX
KFSO
KCLK
PAJZ

edited May 23 '17 at 12:20

Community

1
1

answered Sep 27 '13 at 14:29

philshem

22,161
5
54
110

1

Sorry for resurrecting. Interestingly, this didn't work for me on OSX when the source file contained windows line endings ( \r\n ). awk was unable to recognize the duplicates. Converting it via dos2unix made it work. – srm Jul 23 '18 at 10:20

score 1 · Answer 2 · answered Sep 27 '13 at 06:22

1

Perl One-Liner:

perl -nle 'unless($hash{$_}++){print $_}' file

answered Sep 27 '13 at 06:22

Purandaran

74
5

Nice. I saw a similar technique for eliminating duplicates in an array. – downeyt Oct 29 '17 at 22:19

How to delete duplicate lines in a file...AWK, SED, UNIQ not working on my file

2 Answers2

Linked