0

I have a df in format:

    User_id skill
0   1       python
1   1       java
2   4       java

doing

df=pd.concat([df,pd.get_dummies(df['skill'],prefix='skill')],axis=1) df

outputs:

   User_id skill_python skill_java
0  1       1            0
1  1       0            1
2  4       0            1

I want to get output in format:

   User_id skill_python skill_java
0  1       1            1
1  4       0            1

How can I do that using pandas?

  • Possible duplicate of [How can I one hot encode in Python?](https://stackoverflow.com/questions/37292872/how-can-i-one-hot-encode-in-python) – erip Aug 07 '17 at 21:56
  • Those answers don't address case for grouping by a column_id value..I am looking for the python version of this https://stackoverflow.com/questions/38679911/one-hot-encoding-from-multiple-rows-in-r – Harsh Gupta Aug 07 '17 at 21:59
  • Side note: once you have more than one "hot" line you no longer have "_one hot_ encoding". – Andras Deak Jan 09 '18 at 23:32

1 Answers1

0

Answered here:

Basically, declare index, and pivot table.

https://datascience.stackexchange.com/questions/8253/how-to-binary-encode-multi-valued-categorical-variable-from-pandas-dataframe