2

I find many ways to do this, AWK, SED, UNIQ, but none of them are working on my file.

I want to delete duplicate lines. Here is an example of part of my file:

KTBX
KFSO
KCLK
KTBX
KFSO
KCLK
PAJZ
PAJZ

NOTE: I had to manually add line feeds when I cut and pasted from the file...for some reason it was putting all the variables on one line. Makes me think that my 44,000 line text file actually has only "1" line? Is there a way to modify it so I can delete dups?

Birei
  • 33,968
  • 2
  • 69
  • 79
Corepuncher
  • 315
  • 2
  • 3
  • 15
  • If your file does not have more than one line, it's going to be tough for us to recommend a way to delete duplicate lines. – erewok Sep 26 '13 at 21:30
  • If I VI my file, it has 44,000 lines. – Corepuncher Sep 26 '13 at 21:31
  • 2
    donnot know if you care about the order, if not. simply 'sort your.file | uniq' should do it. – Dyno Fu Sep 26 '13 at 21:33
  • That worked...thanks! Not sure why those others think there are no lines. – Corepuncher Sep 26 '13 at 21:34
  • @Corepuncher did you try `awk '!a[$0]++' file` ? – Kent Sep 26 '13 at 21:36
  • @Kent yes, I tried that one. I typed that, then > newfile , and the newfile was same as old with duplicate entries. – Corepuncher Sep 26 '13 at 21:50
  • 1
    You don't need `sort file | uniq` when `sort -u file` works just as well but clearly you have some issues with your input file format so how can you KNOW that what you're getting spit out of your command is what you want? What does `wc -l file` tell you? How about `head -10 file | cat -v`? – Ed Morton Sep 27 '13 at 01:54

2 Answers2

4

You can see all non-printed characters with this command:

od -c oldfile

If all your records are on one line, you can use sed to replace a whitespace (space, tab, newline) with a linebreak:

sed -e 's/\s\+/\n/g' oldfile > oldfile.1

Once you have multiple lines, this awk one-liner:

awk '!x[$0]++' oldfile.1 > newfile

my outfile:

KTBX
KFSO
KCLK
PAJZ
Community
  • 1
  • 1
philshem
  • 22,161
  • 5
  • 54
  • 110
  • 1
    Sorry for resurrecting. Interestingly, this didn't work for me on OSX when the source file contained windows line endings ( \r\n ). awk was unable to recognize the duplicates. Converting it via dos2unix made it work. – srm Jul 23 '18 at 10:20
1

Perl One-Liner:

perl -nle 'unless($hash{$_}++){print $_}' file

Purandaran
  • 74
  • 5