1

I am currently having the dataset like-

    customerID  store_code  mode
    BBID_204100102  2655    a
    BBID_204100102  2906    b
    BBID_204100102  2906    d 
    BBID_204100150  4986    c      
    BBID_204100150  4986    a
    BBID_204100277  4986    d                                                 
    BBID_204100310  4986    d

and i want something like

    customerID  store_code  a   b  c  d
0   BBID_204100102  2655    1   0   0  0
1   BBID_204100102  2906    0   1   0  0
2   BBID_204100150  4986    1   0   1  0 
3   BBID_204100277  4986    0   0   0  1
4   BBID_204100310  4986    0   0   0  1

first of it makes pivot on customer ud and student id and then it will encode mode in above manner.

kanav anand
  • 319
  • 1
  • 3
  • 12

2 Answers2

1

Use get_dummies with set_index and max by both levels:

df = (pd.get_dummies(df.set_index(['customerID','store_code']), prefix='', prefix_sep='')
        .max(level=[0,1])
        .reset_index())
print (df)
       customerID  store_code  a  b  c  d
0  BBID_204100102        2655  1  0  0  0
1  BBID_204100102        2906  0  1  0  1
2  BBID_204100150        4986  1  0  1  0
3  BBID_204100277        4986  0  0  0  1
4  BBID_204100310        4986  0  0  0  1
jezrael
  • 629,482
  • 62
  • 918
  • 895
1

Option 1 Using pivot_table

In [3781]: df.pivot_table(index=['customerID','store_code'], columns='mode',
                          aggfunc=len, fill_value=0).reset_index()
Out[3781]:
mode      customerID  store_code  a  b  c  d
0     BBID_204100102        2655  1  0  0  0
1     BBID_204100102        2906  0  1  0  1
2     BBID_204100150        4986  1  0  1  0
3     BBID_204100277        4986  0  0  0  1
4     BBID_204100310        4986  0  0  0  1

Option 2 Using groupby

In [3793]: (df.groupby(['customerID', 'store_code', 'mode']).size()
              .unstack(fill_value=0).reset_index())
Out[3793]:
mode      customerID  store_code  a  b  c  d
0     BBID_204100102        2655  1  0  0  0
1     BBID_204100102        2906  0  1  0  1
2     BBID_204100150        4986  1  0  1  0
3     BBID_204100277        4986  0  0  0  1
4     BBID_204100310        4986  0  0  0  1

Option 3 Using set_index and unstack

In [3771]: (df.assign(v=1).set_index(['customerID', 'store_code', 'mode'])['v']
              .unstack(fill_value=0).reset_index())
Out[3771]:
mode      customerID  store_code  a  b  c  d
0     BBID_204100102        2655  1  0  0  0
1     BBID_204100102        2906  0  1  0  1
2     BBID_204100150        4986  1  0  1  0
3     BBID_204100277        4986  0  0  0  1
4     BBID_204100310        4986  0  0  0  1
Zero
  • 59,580
  • 11
  • 126
  • 139