2

I'm an R programmer trying to get into Python. In R, when I want to mutate a column conditionally, I use:

col = dplyr::mutate(col, ifelse(condition, if_true(x), if_false(x))

In Python, how does one mutate a column value conditionally? Here's my minimally reproducible example:

def act(cntnt):
    def do_thing(cntnt):
        return(cntnt + "has it")
    def do_other_thing(cntnt):
        return(cntnt + "nope")
    has_abc = cntnt.str.contains.contains("abc")
    if has_abc == T:
        cntnt[has_abc].apply(do_thing)
    else:
        cntnt[has_abc].apply(do_other_thing)
roganjosh
  • 10,918
  • 4
  • 25
  • 39
Christopher Costello
  • 1,016
  • 1
  • 9
  • 26
  • Please can you add a small example of your problem and the expected outcome? Also, I assume that `if has_abc == T:` is actually `if has_abc == True:` – roganjosh May 24 '18 at 22:04

2 Answers2

8

I think what you're looking for is assign, which is essentially the pandas equivalent to mutate in dplyr. Your conditional statement can be written with a list comprehension, or using vectorized methods (see below).

Take an example dataframe, lets call it df:

> df
             a
1   0.50212013
2   1.01959213
3  -1.32490344
4  -0.82133375
5   0.23010548
6  -0.64410737
7  -0.46565442
8  -0.08943858
9   0.11489957
10 -0.21628132

R / dplyr:

In R, you can use mutate with ifelse to make a column based on a condition (in this example, it will be 'pos' when column a is greater than 0):

df = dplyr::mutate(df, col = ifelse(df$a > 0, 'pos', 'neg'))

And the resulting df:

> df
             a col
1   0.50212013 pos
2   1.01959213 pos
3  -1.32490344 neg
4  -0.82133375 neg
5   0.23010548 pos
6  -0.64410737 neg
7  -0.46565442 neg
8  -0.08943858 neg
9   0.11489957 pos
10 -0.21628132 neg

Python / Pandas

In pandas, use assign with a list comprehension:

df = df.assign(col = ['pos' if a > 0 else 'neg' for a in df['a']])

The resulting df:

>>> df
          a  col
0  0.502120  pos
1  1.019592  pos
2 -1.324903  neg
3 -0.821334  neg
4  0.230105  pos
5 -0.644107  neg
6 -0.465654  neg
7 -0.089439  neg
8  0.114900  pos
9 -0.216281  neg

The ifelse you were using in R is replaced by a list comprehension.

Variations on this:

You don't have to use assign: you can create a new column directly on the df without creating a copy if you want:

df['col'] = ['pos' if a > 0 else 'neg' for a in df['a']]

Also, instead of a list comprehension, you could use one of numpy's vectorized methods for conditional statements, for example, np.select:

import numpy as np
df['col'] = np.select([df['a'] > 0], ['pos'], 'neg')
# or
df = df.assign(col = np.select([df['a'] > 0], ['pos'], 'neg'))
sacuL
  • 42,057
  • 8
  • 58
  • 83
1

You can use the condition (and its negation) for logical indexing:

has_abc = cntnt.str.contains("abc")
cntnt[ has_abc].apply(do_thing)
cntnt[~has_abc].apply(do_other_thing)
DYZ
  • 45,526
  • 9
  • 47
  • 76
  • I can't help but think this has jumped the gun, there's no data at all given – roganjosh May 24 '18 at 21:48
  • @roganjosh Eh? Not sure what you mean. – DYZ May 24 '18 at 21:49
  • Your answer changed quite substantially from the very first one you posted, now it's more likely to be correct. But there's still `if has_abc == T:` in the question so it seems the question isn't complete, and you could potentially address the problem with `np.where` if example data was given, rather than a non-vectorised approach in two stages – roganjosh May 24 '18 at 21:52
  • @roganjosh I see your point. If the OP rephrases their question, then the answer may be optimized. But _first make it right, then make it fast_. – DYZ May 24 '18 at 21:55
  • Thank you! Seeing how to do the inverse of the series of booleans helps me a lot. – Christopher Costello May 25 '18 at 01:45
  • @DyZ why is there a two contains in first line. Is it a typo? – Bharath May 25 '18 at 04:43
  • @Dark I blindly copied it from the OP. A typo. – DYZ May 25 '18 at 05:35