This question is not equal to How to print only the unique lines in BASH? because that ones suggests to remove all copies of the duplicated lines, while this one is about eliminating their duplicates only, i..e, change 1, 2, 3, 3
into 1, 2, 3
instead of just 1, 2
.
This question is really hard to write because I cannot see anything to give meaning to it. But the example is clearly straight. If I have a file like this:
1
2
2
3
4
After to parse the file erasing the duplicated lines, becoming it like this:
1
3
4
I know python or some of it, this is a python script I wrote to perform it. Create a file called clean_duplicates.py
and run it as:
import sys
#
# To run it use:
# python clean_duplicates.py < input.txt > clean.txt
#
def main():
lines = sys.stdin.readlines()
# print( lines )
clean_duplicates( lines )
#
# It does only removes adjacent duplicated lines, so your need to sort them
# with sensitive case before run it.
#
def clean_duplicates( lines ):
lastLine = lines[ 0 ]
nextLine = None
currentLine = None
linesCount = len( lines )
# If it is a one lined file, to print it and stop the algorithm
if linesCount == 1:
sys.stdout.write( lines[ linesCount - 1 ] )
sys.exit()
# To print the first line
if linesCount > 1 and lines[ 0 ] != lines[ 1 ]:
sys.stdout.write( lines[ 0 ] )
# To print the middle lines, range( 0, 2 ) create the list [0, 1]
for index in range( 1, linesCount - 1 ):
currentLine = lines[ index ]
nextLine = lines[ index + 1 ]
if currentLine == lastLine:
continue
lastLine = lines[ index ]
if currentLine == nextLine:
continue
sys.stdout.write( currentLine )
# To print the last line
if linesCount > 2 and lines[ linesCount - 2 ] != lines[ linesCount - 1 ]:
sys.stdout.write( lines[ linesCount - 1 ] )
if __name__ == "__main__":
main()
Although, while searching for duplicates lines remove seems to be easier to use tools as grep, sort, sed, uniq:
- How to remove duplicate lines inside a text file?
- removing line from list using sort, grep LINUX
- Find duplicate lines in a file and count how many time each line was duplicated?
- Remove duplicate entries in a Bash script
- How to delete duplicate lines in a file without sorting it in Unix?
- How to delete duplicate lines in a file...AWK, SED, UNIQ not working on my file