0

I have a dataframe that looks like this.

Dataframe

I want to add a column 'Correct_entry' that returns 'True' or 'False' based on the combination of values in the current columns. In total I have 24 possible combinations that are correct. I give two examples:

1.

If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Red block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)' 
If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Green block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)' 

I have 24 different combinations of values that are all correct and should equal 'True'. All other combinations should equal to 'False'.

What would be the best way to go through this set of data? How can I combine all these different statements?

I hope it is clear.

edit: As requested, the data in text format.

    Hazard_type  Card_type   Round  Scenario    Payment_type
244 Drought      Green block    2   scenario B  Two payments (two consecutive sums)
643 Drought      Red block      4   scenario A  Two payments (two consecutive sums)
584 Drought      Red block      4   scenario A  One payment (lump sum)
242 Drought      Red block      2   scenario B  Two payments (two consecutive sums)
1039 Drought     Green block    6   scenario A  Two payments (two consecutive sums)
101 Flood        Red block      1   scenario A  Two payments (two consecutive sums)
  • Can you provide some of your data in text/code format it will be helpful as from screenshot it is tedious to recreate data. – k33da_the_bug Feb 12 '21 at 15:19
  • https://stackoverflow.com/questions/39109045/numpy-where-with-multiple-conditions use of `np.select()` should perform well for this use case – Rob Raymond Feb 12 '21 at 15:33

2 Answers2

1
  • generate data to match your DF
  • use np.select() to return the matched condition or np.nan for transparency
  • then simple to get to bool
s = 200
df = pd.DataFrame({"Hazard_type":np.random.choice(["Drought","Flood"],s),
             "Card_type":np.random.choice(["Red block","Green block"],s),
             "Round":np.random.randint(1,7,s),
             "Scenario":np.random.choice(["scenario A","scenario B"],s),
             "Payment_type":np.random.choice(["One payment (lump sum)","Two payments"],s)})

conditions = [
    # condition 0
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Red block') & (df['Round'] == 1) 
 & (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
),    
    # condition 1
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Green block') & (df['Round'] == 1) & 
 (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
)]

df = df.assign(Correct_case=np.select(conditions, [c for c in range(len(conditions))], np.nan),
          Correct_entry=lambda dfa: ~dfa.Correct_case.isna())

sample output of Correct_entry==True

Hazard_type Card_type Round Scenario Payment_type Correct_case Correct_entry
47 Drought Red block 1 scenario A One payment (lump sum) 0 True
50 Drought Red block 1 scenario A One payment (lump sum) 0 True
72 Drought Red block 1 scenario A One payment (lump sum) 0 True
104 Drought Red block 1 scenario A One payment (lump sum) 0 True
120 Drought Red block 1 scenario A One payment (lump sum) 0 True
170 Drought Red block 1 scenario A One payment (lump sum) 0 True
186 Drought Green block 1 scenario A One payment (lump sum) 1 True
Rob Raymond
  • 8,924
  • 2
  • 6
  • 17
0

I would approach this by creating a series of boolean masks depending on the scenerio

# boolean masks
drought_mask = df['Hazard_type'] == 'Drought'
red_block_mask = df['Card_type'] == 'Red block'
green_block_mask = df['Card_type'] == 'Green block'
round_1_mask = df['Round'] == 1
scenario_a_mask = df['Scenario'] == 'scenario A'
one_payment_mask = df['Payment_type'] == 'One payment (lump sum)'


# scenerio 1
scenario_1_mask = df.loc[(drought_mask & red_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]
# scenario 2
scenario_2_mask = df.loc[(drought_mask & green_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]

# combining scenerios
df['Correct_entry'] = df.loc[scenario_1_mask | scenario_2_mask]

You can create a mask for every scenario and then combine them in the final correct_entry column using an OR operator (|). This should return a True value for every row which is matches one of the 24 scenarios.

Brendan McDonald
  • 203
  • 2
  • 11