0

dupes is a list of duplicate items found in a list. clipb is the original list.

I now search for a part string of dupes in clipb. The aim at the end of the day is to append the word "duplicate" to the original list per duplicate item found.

dupes = ['0138', '0243']
clipb = ['ABC2b_0243D_K6_LOPA-PAST', 'ABC2b_0016G_M1_LOPA-PABR', 'ABC2b_0138H_M1_LOBR-BRMU', 'ABC2b_0138G_J1_LOPA-PAST', 'ABC2b_0243A_O§_STMA-MACV']

def Filter(clipb, dupes):
    return [str for str in clipb if
            any(sub in str for sub in dupes)]
            #index = clipb.index(clipb)  <<--- no idea how to add it in here 
    
rs_found = (Filter(clipb, dupes))
print ("found dupicates from the original list are: ","\n", rs_found)

Current output is only the list of duplicates found. Found duplicates from the original list are:

['ABC2b_0243D_K6_LOPA-PAST', 'ABC2b_0138H_M1_LOBR-BRMU', 'ABC2b_0138G_J1_LOPA-PAST', 'ABC2b_0243A_O§_STMA-MACV']

My problem is that I have no idea how to format the Filter to include outputting the index of found duplicates so I can actually change the items.

Tomerikoo
  • 12,112
  • 9
  • 27
  • 37
Paul
  • 21
  • 4
  • I don't see an attempt for that in your code. Can you post a [mre] with less noise? It seems like all the `tkinter` stuff are not really relevant to the specific problem you're asking about. A general idea: You already know what are the duplicates. Now just iterate the data and add the suffix to items that are in the `dupes` list... – Tomerikoo Apr 25 '21 at 13:37
  • @Tomerikoo, that is exactly my problem, I have no idea HOW to get the index. added a minimal code block as requested – Paul Apr 25 '21 at 13:52
  • 1
    [Accessing the index in 'for' loops?](https://stackoverflow.com/q/522563/6045800) – Tomerikoo Apr 25 '21 at 13:54
  • 1
    You did a good step towards a [mre]. If the last block recreates your problem, then please remove completely the first block to make your question clearer. Second, make it clear what is your input and what output you expect and get. I see you provided some hard-coded inputs (`dupes` and `clipb`) which is great! Now just post exactly what is the output you get and what you expect it to be. This will help to clear any doubts – Tomerikoo Apr 25 '21 at 13:57

2 Answers2

2

Instead of just filtering out the duplicates, since you want the duplicate items with a tab and 'DUPLICATE' appended to it, just do that when you find a duplicate, instead of filtering it out:

clipb = ['ABC2b_0243D_K6_LOPA-PAST', 'ABC2b_0016G_M1_LOPA-PABR', 'ABC2b_0138H_M1_LOBR-BRMU',
         'ABC2b_0138G_J1_LOPA-PAST', 'ABC2b_0243A_O§_STMA-MACV']

seen = set()
final = []
for item in clipb:
    tag = item[6:10]  # assuming tags are always at this index
    if tag in seen:
        item += '\tDUPLICATE'  # or '<space>DUPLCATE', as needed
    else:
        seen.add(tag)
    final.append(item)

print(final)
# Output:
['ABC2b_0243D_K6_LOPA-PAST',
 'ABC2b_0016G_M1_LOPA-PABR',
 'ABC2b_0138H_M1_LOBR-BRMU',
 'ABC2b_0138G_J1_LOPA-PAST\tDUPLICATE',
 'ABC2b_0243A_O§_STMA-MACV\tDUPLICATE']

Note that you don't need to pre-create a list of the duplicate tags - thats' done in the code; vaguely adapted from unique_everseen recipe from https://docs.python.org/3/library/itertools.html.

aneroid
  • 11,031
  • 3
  • 33
  • 54
  • Oh wow, now that is brilliant, did not know one can do this. Thank you very much indeed – Paul Apr 25 '21 at 14:38
2

Your current direction is quite good. You don't really need the index here at all! You are using any(sub in str for sub in dupes) to check if any of the duplicated patterns is in the string which is good. You only a small logical refinement.

What should happen when the condition above is true? You want to add the "duplicate" string. What happens if it is not true? Add the original string as is. So just modify the list comprehension to be:

def Filter(clipb, dupes):
    return [s + " duplicate" if any(sub in s for sub in dupes) 
            else s
            for s in clipb]

* Note that I changed the str variable's name because str is the built-in type's name.

The output with your sample data is:

found dupicates from the original list are:  
 ['ABC2b_0243D_K6_LOPA-PAST duplicate', 'ABC2b_0016G_M1_LOPA-PABR', 'ABC2b_0138H_M1_LOBR-BRMU duplicate', 'ABC2b_0138G_J1_LOPA-PAST duplicate', 'ABC2b_0243A_O§_STMA-MACV duplicate']

If you want to change the original list in-place, you can use the built-in enumerate() function to iterate over index and item:

for i, s in enumerate(clipb):
    if any(sub in s for sub in dupes):
        clipb[i] = s + " duplicate"
Tomerikoo
  • 12,112
  • 9
  • 27
  • 37
  • @Paul Happy to help. Please do note though that aneroid's proposal is more efficient then your direction which I followed in my answer. Make sure you understand it as well and use it in the future. Lastly, I just wanted to say following the latest fashion of complaining about SO being harsh and downvotes and all that: Please also look at the difference between your original question and its current state. The original was too messy with a lot of noise while now it is indeed a fine question. Try to learn from that how to post questions in the future. I'm happy I could help you get it there – Tomerikoo Apr 25 '21 at 14:54
  • 1
    Yup, it is difficult for me to figure out how to exactly ask a question, some like more info, some like less, some want to see the steps taken, some only the problem. My thinking was, the more info, the better informed you all are, but.. I see from this, that I can indeed reduce it. Point taken. Thank you – Paul Apr 25 '21 at 16:27
  • @Paul I completely relate and understand. Always try to distill your problem to the core, and then recreate a snippet of code demonstrating it. The best example is with your current question. Maybe your code is part of a much bigger project involving tkinter and maybe more libraries. But in the end, your problem is with finding duplicates in a list and handling them. All the rest is not important for us. Anyway it's nice to see that you listened to the critic and cared to improve your question. Many people don't. You can get even more help at [ask] and [mre] – Tomerikoo Apr 25 '21 at 16:53