I have a pandas DataFrame that looks like this:
Col1 Col2
0 A code1
1 B code1
2 C code2
3 A code1
I want to add a column to the DataFrame for each unique code value in Col 2 and put a 1 in the column that matches the code value and a 0 in all other newly-created columns. In other words, I want a DataFrame that looks like this:
Col1 Col2 Code1 Code2
0 A code1 1 0
1 B code1 1 0
2 C code2 0 1
3 A code1 1 0
The following code works for this small data sample:
def assign_code1(row):
if row['Col2'] == 'code1' : return 1
else :return 0
def assign_code2(row):
if row['Col2'] == 'code2' : return 1
else :return 0
df['Code1'] = df.apply(assign_code1 ,axis=1)
df['Code2'] = df.apply(assign_code2 ,axis=1)
However, I really hate this code because:
- I have to know all the values in Col2;
- I have to write a separate function for each unique value in Col2; and
- I have read in other posts that using apply() to create new columns is frowned upon (though I didn't fully understand why - I have only been working with pandas for a couple of weeks).
I'd like to have code that obtains a list of all the unique values in Col2, creates a column for each, and correctly populates all the 1's and 0's in those columns. Is there an elegant way to do this?