Conditions based on multiple column values

Question

I have a dataframe that looks like this.

I want to add a column 'Correct_entry' that returns 'True' or 'False' based on the combination of values in the current columns. In total I have 24 possible combinations that are correct. I give two examples:

1.

If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Red block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)'

If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Green block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)'

I have 24 different combinations of values that are all correct and should equal 'True'. All other combinations should equal to 'False'.

What would be the best way to go through this set of data? How can I combine all these different statements?

I hope it is clear.

edit: As requested, the data in text format.

    Hazard_type  Card_type   Round  Scenario    Payment_type
244 Drought      Green block    2   scenario B  Two payments (two consecutive sums)
643 Drought      Red block      4   scenario A  Two payments (two consecutive sums)
584 Drought      Red block      4   scenario A  One payment (lump sum)
242 Drought      Red block      2   scenario B  Two payments (two consecutive sums)
1039 Drought     Green block    6   scenario A  Two payments (two consecutive sums)
101 Flood        Red block      1   scenario A  Two payments (two consecutive sums)

Can you provide some of your data in text/code format it will be helpful as from screenshot it is tedious to recreate data. — k33da_the_bug, Feb 12 '21 at 15:19
https://stackoverflow.com/questions/39109045/numpy-where-with-multiple-conditions use of `np.select()` should perform well for this use case — Rob Raymond, Feb 12 '21 at 15:33

Rob Raymond · Accepted Answer · 2021-02-15T09:52:48.850

generate data to match your DF
use np.select() to return the matched condition or np.nan for transparency
then simple to get to bool

s = 200
df = pd.DataFrame({"Hazard_type":np.random.choice(["Drought","Flood"],s),
             "Card_type":np.random.choice(["Red block","Green block"],s),
             "Round":np.random.randint(1,7,s),
             "Scenario":np.random.choice(["scenario A","scenario B"],s),
             "Payment_type":np.random.choice(["One payment (lump sum)","Two payments"],s)})

conditions = [
    # condition 0
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Red block') & (df['Round'] == 1) 
 & (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
),    
    # condition 1
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Green block') & (df['Round'] == 1) & 
 (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
)]

df = df.assign(Correct_case=np.select(conditions, [c for c in range(len(conditions))], np.nan),
          Correct_entry=lambda dfa: ~dfa.Correct_case.isna())

sample output of Correct_entry==True

	Hazard_type	Card_type	Round	Scenario	Payment_type	Correct_case	Correct_entry
47	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
50	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
72	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
104	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
120	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
170	Drought	Red block	1	scenario A	One payment (lump sum)	0	True
186	Drought	Green block	1	scenario A	One payment (lump sum)	1	True

I get an error message saying "list of cases must be same length as list of conditions". Any idea why that is? — Joep Hoeijmakers, Feb 15 '21 at 09:35
I've updated to dynamically build second list argument to `select()`. this list needs to be same length as **condition** list. I assume you have added more of your conditions — Rob Raymond, Feb 15 '21 at 09:53

score 0 · Answer 2 · answered Feb 12 '21 at 15:47

I would approach this by creating a series of boolean masks depending on the scenerio

# boolean masks
drought_mask = df['Hazard_type'] == 'Drought'
red_block_mask = df['Card_type'] == 'Red block'
green_block_mask = df['Card_type'] == 'Green block'
round_1_mask = df['Round'] == 1
scenario_a_mask = df['Scenario'] == 'scenario A'
one_payment_mask = df['Payment_type'] == 'One payment (lump sum)'


# scenerio 1
scenario_1_mask = df.loc[(drought_mask & red_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]
# scenario 2
scenario_2_mask = df.loc[(drought_mask & green_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]

# combining scenerios
df['Correct_entry'] = df.loc[scenario_1_mask | scenario_2_mask]

You can create a mask for every scenario and then combine them in the final correct_entry column using an OR operator (|). This should return a True value for every row which is matches one of the 24 scenarios.

I get this error when I do it like this "unsupported operand type(s) for |: 'float' and 'bool'" — Joep Hoeijmakers, Feb 15 '21 at 09:50

Conditions based on multiple column values

2 Answers2

sample output of Correct_entry==True