0

I have a data frame like this

Value  Name
6   jameel
8   jameel
1   sarah
8   jameel
1   sarah
10  jameel
1   sarah
10  jameel
1   sarah
10  jameel
2   sarah
11  jameel
1   sarah
12  jameel
2   sarah
14  jameel
3   sarah
14  jameel

I want to covert this data frame so that the first occurrence for sarah is one plus the previous max value of Jameel so according to this data frame the first occurrence for sarah is in the third row which will become 9 as the max previous value for jameel is 8. After this wherever the value of sarah is 1 it should be replaced by 9 and it should not change. Now that when we have done this we can notice jameel's value increases and becomes max at 10 at row 8 as it shouldn't change the subsequent sarah value as it is 1 but instead it should change sarah's value at row 11 to 10+1 = 11 as it is not 1 but 2 and this procedure should follow and the resultant frame should look like this.

Value  Name
6   jameel
8   jameel
9   sarah
8   jameel
9   sarah
10  jameel
9   sarah
10  jameel
9   sarah
10  jameel
11  sarah
11  jameel
9   sarah
12  jameel
11  sarah
14  jameel
15  sarah
14  jameel

Notice that the jameel's values and sarah's values individually are always in ascending order

values for jameel :6,8,10,11,12,14

values for sarah :1,2,3

Is there a way to this in pandas or is there any other pythonic way to this?

Mohsin
  • 303
  • 4
  • 12

1 Answers1

1

The following should do the trick:

import pandas as pd

data = {'Value'     :[6,8,1,8,1,10,1,10,1,10,2,11,1,12,2,14,3,14],
        'Name'      :['J','J','S','J','S','J','S','J','S','J','S','J','S','J','S','J','S','J'],
        'Address'   :['rd','rd','rd','st','rd','st','rd','st','rd','rd','rd','rd','st','st','av','rd','av','av']}   

df = pd.DataFrame(data)

S_vals = df['Value'][df['Name'] == 'S']
J_vals = df['Value'][df['Name'] == 'J']

# Get locs of unique values
unique_vals = set(S_vals)
locs = [S_vals[df['Value'] == v].index[0] for v in unique_vals]

df_new = df.copy()

# Set values
for l in locs:
    for j in S_vals.index[S_vals==S_vals[l]]:
        df_new.at[j, 'Value'] = max(J_vals[J_vals.index<l])+1

        # This is deprecated
        #df_new.set_value(j, 'Value', max(J_vals[J_vals.index<l])+1)

Edit: extra column added to show that this works for data structures with more columns than just 'Value' and 'Name'.

The values of 'Sarah' and 'Jameel' are stored in S_vals and J_vals. The locations of the unique values of 'Sarah' are then stored in locs. Then, the values are set:

  • Iterate through the locations of the unique values
  • For each unique value, iterate through all of the indices corresponding to matches with Sarah having that value (first we do all of the 1s, then all of the 2s, etc.)
  • Then replace each one of the values corresponding with Sarah with the maximal value corresponding to Jameel preceeding this, +1

This gives:

>>> print(df)
    Value Name Address
0       6    J      rd
1       8    J      rd
2       1    S      rd
3       8    J      st
4       1    S      rd
5      10    J      st
6       1    S      rd
7      10    J      st
8       1    S      rd
9      10    J      rd
10      2    S      rd
11     11    J      rd
12      1    S      st
13     12    J      st
14      2    S      av
15     14    J      rd
16      3    S      av
17     14    J      av

>>> print(df_new)
    Value Name Address
0       6    J      rd
1       8    J      rd
2       9    S      rd
3       8    J      st
4       9    S      rd
5      10    J      st
6       9    S      rd
7      10    J      st
8       9    S      rd
9      10    J      rd
10     11    S      rd
11     11    J      rd
12      9    S      st
13     12    J      st
14     11    S      av
15     14    J      rd
16     15    S      av
17     14    J      av

I've left in a commented-out line with .set_value: this is deprecated, but is quicker (see this analysis here).

Enjoy!

soyapencil
  • 391
  • 3
  • 8
  • Nice, would this work if I have more than two columns in the datasets for example Value,Name,Address but the address doesn't play any role in the conversions but should be there are it is J's and S's address. – Mohsin Oct 11 '19 at 11:19
  • Works fine for me. I'll edit the above to show that this is indeed the case. – soyapencil Oct 11 '19 at 11:41