1

This should be a very simple question to answer. I have two lines of code. The first one works. The second gives the following error:

SyntaxError: invalid syntax

Here are the two lines of code. The first line (which works fine) counts the rows where off0_on1 == 1. The second one trys to count the rows where off0_on1 == 0.

a1['on1'] = a1.groupby('del_month')['off0_on1'].transform(sum)
a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: 1 if x == 0)

Here is the pandas dataframe:

a1 = pd.DataFrame({'del_month':[1,1,1,1,2,2,2,2], 'off0_on1':[0,0,1,1,0,1,1,1]})

Any suggestions to revise the second line of code above?

Edit: Two of the answers have suggested using a map function, which produces the following output. The "on1" column is correct for my purposes; the "off0" column is not correct. For the first "del_month", the "off0" column should have the same results as the "on1" column. For the second "del_month", the "off0" column should be all ones (i.e. 1, 1, 1, 1).

Here's what happens when I use the following map function (see image below):

a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda series: map(lambda x: 1 if x == 0 else 0, series))

enter image description here

Edit 2 Not sure if this clarifies things, but ultimately I want pandas to do what the following SQL code does so easily:

select
    del_month
    , sum(case when off0_on1 = 1 then 1 else 0 end) as on1
    , sum(case when off0_on1 = 0 then 1 else 0 end) as off0
from a1
group by del_month
order by del_month

Edit 3 This new question contains the answer I need. Thanks everyone!

Sean McCarthy
  • 2,674
  • 5
  • 24
  • 33
  • 1
    `lambda x: 1 if x == 0 else 0`. See [this](https://stackoverflow.com/a/394814/4799172) for more general details – roganjosh Nov 01 '18 at 16:08
  • a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: 1 if x == 0 else 0) gives the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Any suggestions? Thanks! – Sean McCarthy Nov 01 '18 at 16:16
  • I'm not sure what relevance the `groupby` has for this column? Why can't you just use `np.where()` on the un-grouped DF? – roganjosh Nov 01 '18 at 16:18
  • This isn't my actual dataframe, just a sample to make the question simpler and easier to answer. My actual dataframe has millions of rows, and I need to group it by delivery month. – Sean McCarthy Nov 01 '18 at 16:19
  • I can appreciate that, but the operation doesn't depend on what group a row belongs to. You've already performed the `sum` of the groups, now you're just hunting for `0` values. – roganjosh Nov 01 '18 at 16:27
  • 1
    The reason I'm challenging you on this is that `lambda x: 1 if x == 0 else 0` will run as a _Python_ `for` loop i.e. very slow - it won't be any faster than base python and you have millions of records. `numpy.where()` will be multiple orders of magnitude faster. You should really check to ensure that it's crucial to the output that you use `groupby` and pass a `lambda` for the second operation; I'm not convinced it is. – roganjosh Nov 01 '18 at 16:32
  • Interesting perspective. I definitely would like to vectorize to keep things speedy/efficient, but I do need to group the results by month. I need to count the off-peak hours each month, and count the on-peak hours. I guess I could also count the total and subtract the on-peak hours, but in this case I was just rewriting a colleague's code so I was trying to follow his lead. Good suggestion though. Could you show me how you would use np.where() on each monthly group? Thanks again. – Sean McCarthy Nov 01 '18 at 17:07
  • 1
    If the picture isn't what you wanted then I'm not sure. But look how the output that you say is incorrect is completely independent of the groups; I understood the question the same as the other answerers and could be achieved without any grouping at all. – roganjosh Nov 01 '18 at 17:44
  • Yes, the map() function suggested below is the same as simply a1['off0'] = numpy.where(a1.off0_on1 == 0, 1, 0), but much more complicated. I need to group the results and sum/count them by month; otherwise your numpy.where() suggestion is the simplest/fastest. – Sean McCarthy Nov 01 '18 at 18:03
  • I have asked a more general question here, if anyone wants to try it: https://stackoverflow.com/questions/53107112/use-pandas-to-group-by-column-and-then-create-a-new-column-based-on-a-condition – Sean McCarthy Nov 01 '18 at 18:20

3 Answers3

2

When you define the if statement (shorthand way) you need to define the else as well.

lambda x: 1 if x == 0 else 0 # For example.

EDIT:

In your first lambda it's a Series Actually, so you need to loop it (with map for example)

a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: map(lambda value: 1 if value == 0 else 0, x))
omri_saadon
  • 8,823
  • 5
  • 25
  • 53
  • I tried this: a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: 1 if x == 0 else 0) but got the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Any suggestions? Thanks! – Sean McCarthy Nov 01 '18 at 16:24
  • 1
    @Sean_Calgary, Added an Edit section for your issue. – omri_saadon Nov 01 '18 at 16:34
  • I added a picture to my question to show that the map() suggestion produces results that are usable, but not exactly what I need. Thanks again. – Sean McCarthy Nov 01 '18 at 17:22
  • Maybe I should accept one of these answers (because they solved my initial lambda syntax issue) and ask a second question about how to group and then sum if, as with the SQL "sum(case when...)", which is what I really want to do. – Sean McCarthy Nov 01 '18 at 17:34
  • The map() function is the same as simply a1['off0'] = numpy.where(a1.off0_on1 == 0, 1, 0), but much more complicated. I need to group the results and sum/count them by month; otherwise I'd just use numpy.where(). – Sean McCarthy Nov 01 '18 at 18:02
  • @Sean_Calgary, `np.where` will work better in this case as roganjosh wrote in the comments. The answer claim to overcome your error and to understand it. There are a lot of ways to improve the logic of our code, always. Good luck ! :) – omri_saadon Nov 01 '18 at 18:09
  • I have asked a more general question here, if anyone wants to try it: https://stackoverflow.com/questions/53107112/use-pandas-to-group-by-column-and-then-create-a-new-column-based-on-a-condition – Sean McCarthy Nov 01 '18 at 18:20
1

Your code: lambda x: 1 if x == 0 is not a valid expression. Try something like lambda x: 1 if x == 0 else 0

See below:

>>> example = lambda x: 1 if x == 0 else 0
>>> example(0)
1
>>> example(1)
0
  • I tried this: a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: 1 if x == 0 else 0) but got the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Any suggestions? Thanks! – Sean McCarthy Nov 01 '18 at 16:25
  • 1
    I believe you actually want to map the series. Your lambda right now is comparing a row to 0 when you want to compare elements of the row to 0. I would try `a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda series: map(lambda elem: 1 if x == 0 else 0, series))`. – Kevin Koehler Nov 01 '18 at 16:31
  • I've edited my question and added a picture. The map function is not what I want. I can use it, ultimately, but I'd like my "off0" column to be like my "on1" column. – Sean McCarthy Nov 01 '18 at 17:16
  • Maybe I should accept one of these answers (because they solved my initial lambda syntax issue) and ask a second question about how to group and then sum if, as with the SQL "sum(case when...)", which is what I really want to do. – Sean McCarthy Nov 01 '18 at 17:34
  • The map() function is the same as simply a1['off0'] = numpy.where(a1.off0_on1 == 0, 1, 0), but much more complicated. I need to group the results and sum/count them by month; otherwise I'd just use numpy.where(). – Sean McCarthy Nov 01 '18 at 18:02
  • I have asked a more general question here, if anyone wants to try it: https://stackoverflow.com/questions/53107112/use-pandas-to-group-by-column-and-then-create-a-new-column-based-on-a-condition – Sean McCarthy Nov 01 '18 at 18:20
0

The previous answers quickly fixed my lambda function error by adding "else 0" at the end. My ultimate question was answered here with the following line of code:

a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: sum(x==0)) 
Sean McCarthy
  • 2,674
  • 5
  • 24
  • 33