I have a text file that consists of similar lines and few are half similar to other lines in a text file.
Input.txt
I would like to play: Volleyball
I would like to play: Volleyball
I would like to play: TableTennis
I would like to play: Baseball
I do not know how to play: Volleyball
She would like to play: TableTennis
I want to learn how to play: Baseball
They like to play: all the three
From the input file, I wanted to remove the repeated lines as shown
I would like to play: Volleyball
I would like to play: TableTennis
I would like to play: Baseball
I do not know how to play: Volleyball
She would like to play: TableTennis
I want to learn how to play: Baseball
They like to play: all three
From the input file, I wanted to remove the repeated lines as shown
I would like to play: Volleyball
I would like to play: TableTennis
I would like to play: Baseball
I do not know how to play: Volleyball
She would like to play: TableTennis
I want to learn how to play: Baseball
They like to play: all three
In the next step:
I would like to play
They like to play
a brief explanation for the output file The statement I would like to play covered many different sports so I want that to print. The last line They like to play is a different case so I want to print that line as well. (How about we write these results into .csv format and print the statements that covered the maximum number of sports and also all the unique sports in different columns)
Note: I don't want to print I do not know how to play: Volleyball She would like to play: TableTennis I want to learn how to play: Baseball
because three sports are already covered
I got confused about how we compare the one line with another in the same text file. Any help would be appreciated. Thank you