Python Pandas: If item in list of lists matches column row in pandas, add new columns

Question

I have a pandas dataframe, and a list of lists (each sublist has 3 items [name, seq, qual]). I want to check if the first item in the list of lists matches the name column of the df, and if it does I want to create new columns with item[1] and item[2] added to the dataframe.

To set it up:

reads = [['read1', 'ACTG', 'FFFF'], ['read2', 'TTTT', 'FF:F'], ['read3', 'ATGC', 'F:FF']]
df = pd.DataFrame(reads, columns=['ReadName', 'Sequence1', 'Qual1'])
reads2 = [['read3', 'CGCG', 'F::F'], ['read1', 'TGTG', 'F:FF'], ['read2', 'AAAA', 'FFFF']]

What I've tried:

for item in reads2:
    if item[0] in df['ReadName']:
        df['Sequence2'] = item[1]
        df['Qual2'] = item[2]

but the resultant df looks like:

  ReadName Sequence1 Qual1 Sequence2 Qual2
0    read1      ACTG  FFFF      CGCG  F::F
1    read2      TTTT  FF:F      CGCG  F::F
2    read3      ATGC  F:FF      CGCG  F::F

So it is only adding the first item from the list of lists to all the rows in the df. I would expect it to look like:

  ReadName Sequence1 Qual1 Sequence2 Qual2
0    read1      ACTG  FFFF      TGTG  F:FF
1    read2      TTTT  FF:F      AAAA  FFFF
2    read3      ATGC  F:FF      CGCG  F::F

Each time the condition `if` is met, you assign the values to the whole columns with e.g. `df['Sequence2'] = item[1]`. what you want to do looks like a merge. try `df.merge(pd.DataFrame(reads2, columns=['ReadName', 'Sequence2', 'Qual2']))` — Ben.T, Jun 20 '19 at 19:05
Does `df.merge(pd.DataFrame(reads2, columns=df.columns), on='ReadName')` not do what you want? — cs95, Jun 20 '19 at 19:14
@cs95 It's been a long day at the office. I need to go home. I like the dup target! — d_kennetz, Jun 20 '19 at 19:49

score 1 · Answer 1 · answered Jun 20 '19 at 19:11

1

use .loc to update row

for item in reads2:

    if item[0] in list(df['ReadName']):

        df.loc[df['ReadName']==item[0],'Sequence2'] = item[1]
        df.loc[df['ReadName']==item[0],'Qual2'] = item[2]

answered Jun 20 '19 at 19:11

tawab_shakeel

3,345
5
23

score 1 · Answer 2 · edited Jun 20 '19 at 19:18

1

Use a left join:

df2 = pd.DataFrame(reads2, columns = ['ReadName', 'Sequence2', 'Qual2'])
df.merge(df2, on='ReadName')

Looks like:

    ReadName    Sequence1   Qual1   Sequence2   Qual2
0   read1   ACTG    FFFF    TGTG    F:FF
1   read2   TTTT    FF:F    AAAA    FFFF
2   read3   ATGC    F:FF    CGCG    F::F

edited Jun 20 '19 at 19:18

Brian Tompsett - 汤莱恩

5,195
62
50
120

answered Jun 20 '19 at 19:13

ludan

26
5

Python Pandas: If item in list of lists matches column row in pandas, add new columns

2 Answers2