0

I have a pandas DataFrame that looks like this:

  Col1   Col2
0    A  code1
1    B  code1
2    C  code2
3    A  code1

I want to add a column to the DataFrame for each unique code value in Col 2 and put a 1 in the column that matches the code value and a 0 in all other newly-created columns. In other words, I want a DataFrame that looks like this:

  Col1   Col2  Code1  Code2
0    A  code1      1      0
1    B  code1      1      0
2    C  code2      0      1
3    A  code1      1      0

The following code works for this small data sample:

def assign_code1(row):
    if row['Col2'] == 'code1' : return 1
    else :return 0

def assign_code2(row):
    if row['Col2'] == 'code2' : return 1
    else :return 0    

df['Code1'] = df.apply(assign_code1 ,axis=1)
df['Code2'] = df.apply(assign_code2 ,axis=1)

However, I really hate this code because:

  • I have to know all the values in Col2;
  • I have to write a separate function for each unique value in Col2; and
  • I have read in other posts that using apply() to create new columns is frowned upon (though I didn't fully understand why - I have only been working with pandas for a couple of weeks).

I'd like to have code that obtains a list of all the unique values in Col2, creates a column for each, and correctly populates all the 1's and 0's in those columns. Is there an elegant way to do this?

0 Answers0