4

This question is not equal to How to print only the unique lines in BASH? because that ones suggests to remove all copies of the duplicated lines, while this one is about eliminating their duplicates only, i..e, change 1, 2, 3, 3 into 1, 2, 3 instead of just 1, 2.

This question is really hard to write because I cannot see anything to give meaning to it. But the example is clearly straight. If I have a file like this:

1
2
2
3
4

After to parse the file erasing the duplicated lines, becoming it like this:

1
3
4

I know python or some of it, this is a python script I wrote to perform it. Create a file called clean_duplicates.py and run it as:

import sys

#
# To run it use:
# python clean_duplicates.py < input.txt > clean.txt
#
def main():

    lines = sys.stdin.readlines()

    # print( lines )
    clean_duplicates( lines )

#
# It does only removes adjacent duplicated lines, so your need to sort them
# with sensitive case before run it.
# 
def clean_duplicates( lines ):

    lastLine    = lines[ 0 ]
    nextLine    = None
    currentLine = None
    linesCount  = len( lines )

    # If it is a one lined file, to print it and stop the algorithm
    if linesCount == 1:

        sys.stdout.write( lines[ linesCount - 1 ] )
        sys.exit()

    # To print the first line
    if linesCount > 1 and lines[ 0 ] != lines[ 1 ]:

        sys.stdout.write( lines[ 0 ] )

    # To print the middle lines, range( 0, 2 ) create the list [0, 1]
    for index in range( 1, linesCount - 1 ):

        currentLine = lines[ index ]
        nextLine    = lines[ index + 1 ]

        if currentLine == lastLine:

            continue

        lastLine = lines[ index ]

        if currentLine == nextLine:

            continue

        sys.stdout.write( currentLine )

    # To print the last line
    if linesCount > 2 and lines[ linesCount - 2 ] != lines[ linesCount - 1 ]:

        sys.stdout.write( lines[ linesCount - 1 ] )

if __name__ == "__main__":

    main()

Although, while searching for duplicates lines remove seems to be easier to use tools as grep, sort, sed, uniq:

  1. How to remove duplicate lines inside a text file?
  2. removing line from list using sort, grep LINUX
  3. Find duplicate lines in a file and count how many time each line was duplicated?
  4. Remove duplicate entries in a Bash script
  5. How to delete duplicate lines in a file without sorting it in Unix?
  6. How to delete duplicate lines in a file...AWK, SED, UNIQ not working on my file
user
  • 5,816
  • 7
  • 53
  • 105
  • Are the duplicate lines always adjacent? Suppose the input was 1, 2, 2, 3, 4, 2 — should the 2 after the 4 appear in the output? – Jonathan Leffler Dec 01 '16 at 17:44
  • Yes, I sort them before doing it to make it easier to write the code. Anyways, the best is to have used `uniq -u` right away. – user Dec 01 '16 at 17:46
  • 3
    Be aware that given input 1, 2, 2, 3, 4, 2, `uniq -u` will print the second 2; it works only on adjacent duplicate lines. Pre-sorting is therefore a good idea. Also note that `uniq` takes zero or one input files, and if there's an input file, it can take an output file: `uniq [-c|-d|-u] [-f fields] [-s char] [input_file [output_file]]` according to POSIX. It is not a general file filter (a general file filter takes zero or more file names and processes either standard input or each file name in turn, writing to standard output). – Jonathan Leffler Dec 01 '16 at 17:52
  • Thanks! The `uniq` doc was misleading. I tested it here, it only remove adjacent lines. – user Dec 01 '16 at 18:01
  • Does this answer your question? [How to print only the unique lines in BASH?](https://stackoverflow.com/questions/23740545/how-to-print-only-the-unique-lines-in-bash) – c-x-berger May 25 '21 at 21:10

4 Answers4

8

You may use uniq with -u/--unique option. As per the uniq man page:

-u / --unique

Don't output lines that are repeated in the input.
Print only lines that are unique in the INPUT.

For example:

cat /tmp/uniques.txt | uniq -u

OR, as mentioned in UUOC: Useless use of cat, better way will be to do it like:

uniq -u /tmp/uniques.txt

Both of these commands will return me value:

1
3
4

where /tmp/uniques.txt holds the number as mentioned in the question, i.e.

1
2
2
3
4

Note: uniq requires the content of file to be sorted. As mentioned in doc:

By default, uniq prints the unique lines in a sorted file, it discards all but one of identical successive input lines. so that the OUTPUT contains unique lines.

In case file is not sorted, you need to sort the content first and then use uniq over the sorted content:

sort /tmp/uniques.txt | uniq -u
Community
  • 1
  • 1
Anonymous
  • 40,020
  • 8
  • 82
  • 111
  • @JonathanLeffler: Thanks for sharing the info. Updated the answer – Anonymous Dec 01 '16 at 17:47
  • 1
    @EdMorton: Yes, I guess that was one very important piece of information that I missed, as the content of file in question were already sorted. Updated the answer – Anonymous Dec 01 '16 at 18:07
4

No sorting required and output order will be the same as input order:

$ awk 'NR==FNR{c[$0]++;next} c[$0]==1' file file
1
3
4
Ed Morton
  • 157,421
  • 15
  • 62
  • 152
1
Europe Finland Office Supplies Online H 5/21/2015 193508565 7/3/2015 2339 651.21 524.96 1523180.19 1227881.44 295298.75
Europe Greece Household Online L 9/11/2015 895509612 9/26/2015 49 668.27 502.54 32745.23 24624.46 8120.77
Europe Hungary Beverages Online C 8/21/2012 722931563 8/25/2012 370 47.45 31.79 17556.50 11762.30 5794.20
Europe Hungary Beverages Online C 8/21/2012 722931563 8/25/2012 370 47.45 31.79 17556.50 11762.30 5794.20

If you have this kind of lines you can use this command.

[isuru@192 ~]$ sort duplines.txt | sed 's/\ /\-/g' | uniq | sed 's/\-/\ /g'    

But keep in mind when using special characters. If there dashes in your lines makes sure to use different symbol. Here i keep a space between back & forward slash.

Before applied the code

After applied the code

Ed91
  • 11
  • 2
0

Kindly use sort command with -u argument for listing unique values of any command's output.

    cat file_name |sort -u
1
2
3
4
linux.cnf
  • 67
  • 3
  • This other (https://stackoverflow.com/a/40916868/4934640) answer on this page already cover this. – user Mar 02 '21 at 12:12