This is my dataframe:
C1 | C2 |... email_id | subject | sender | recipient
| | | congrats| x | y
| | | congrats | z | y
| | | congrats | x |y
| | | meeting | x | y
Output:
C1 | C2 |... email_id | subject | sender | recipient
| | 0 | congrats | x | y
| | 1 | congrats | z | y
| | 0 | congrats | x | y
| | 2 | meeting | x | y
For every unique combination of subject, sender and recipient I want to assign an emai_id. I have gotten unique triples like this:
df1 = df.drop_duplicates(subset=['sender','recipient','subject'])
In order to assign value this is what I am doing
sender = df1.sender
recipient = df1.recipient
subject = df1.subject
n = 0
for i in sender:
df.loc[df["subject"] == str(i) and df["subject"] == str(i) , "email_id"] = n
n=n+1
This isn't the correct way to go about it. How do I add an "add" condition here? Edit: In short, assign same number to same triplets