3

I would like to assign a single tuple to a boolean-indexed slice of my dataframe, like this:

>>> import pandas as pd
>>> mydataframe = pd.DataFrame([1,2,3,4,5],columns=['colname'])
>>> mydataframe.loc[mydataframe['colname']>2,'colname'] = (1,2)

Desired output:

>>> mydataframe
   colname
0        1
1        2
2        (1,2,3)
3        (1,2,3)
4        (1,2,3)

However, instead of assigning the tuple to each element, pandas tries to assign each element of the tuple to an element in the slice, and errors out because the shapes don't match.

Actual output:

ValueError: shape mismatch: value array of shape (2,) could not be broadcast 
to indexing result of shape (3,)

I've tried using the set_value function and get the same behavior:

>>> mydataframe.set_value(mydataframe['colname']>2,'colname', (1,2))
ValueError: shape mismatch: value array of shape (2,) could not be broadcast
to indexing result of shape (3,)

This question works for assigning to a single element in the dataframe: Add a tuple to a specific cell of a pandas dataframe

Is there a way to do this assignment without resorting to looping over the elements in the slice?

Edit: I also tried the following as per EdChum's answer and it still isn't behaving as expected:

>>> mydataframe = pd.DataFrame([1,2,3,4,5],columns=['colname'])
>>> assignment_series = pd.Series([(1,2,3)]*np.sum(mydataframe['colname']>2))
    >>>> assignment_series
0    (1, 2, 3)
1    (1, 2, 3)
2    (1, 2, 3)
dtype: object
>>> mydataframe.loc[mydataframe['colname']>2,'colname'] = assignment_series
>>> mydataframe
     colname
0          1
1          2
2  (1, 2, 3)
3        NaN
4        NaN

Edit2: Sorry, I misunderstood EdChum's answer. The previous edit is not what he was saying, the assignment_series should be the same length as mydataframe, not mydataframe.loc[mydataframe['colname']>2,'colname'] as I did above. See EdChum's answer below.

Community
  • 1
  • 1
Emma
  • 1,207
  • 1
  • 17
  • 21

1 Answers1

2

You'll have to construct a Series with the tuple repeated by the length of your df so it aligns:

In [37]:
mydataframe = pd.DataFrame([1,2,3,4,5],columns=['colname'])
mydataframe.loc[mydataframe['colname']>2,'colname']=pd.Series([(1,2,3) for x in range(len(mydataframe))])
mydataframe

Out[37]:
     colname
0          1
1          2
2  (1, 2, 3)
3  (1, 2, 3)
4  (1, 2, 3)

So the key point here is that you want to assign a tuple as a single element for each row so you need to match the desired shape which here is a 5 row series with it's index matching the lhs, we use a list comprehension to repeat the tuple N row times:

[(1,2,3) for x in range(len(mydataframe))]

And pass this as the data arg for the Series to produce:

In [39]:
pd.Series([(1,2,3) for x in range(len(mydataframe))])

Out[39]:
0    (1, 2, 3)
1    (1, 2, 3)
2    (1, 2, 3)
3    (1, 2, 3)
4    (1, 2, 3)
dtype: object

As you're masking on the lhs it only takes the rows where the condition is met

EdChum
  • 294,303
  • 173
  • 671
  • 486
  • Thanks @EdChum, but it still isn't doing what I want. Not sure what it's trying to do here, have a look at my edit above. – Emma Jun 23 '16 at 16:58
  • But you're not doing the same thing if you look at your generated series the index goes from 0 to 2 so only row 2 gets assigned, I'm generating a series that matches the non masked series so it aligns correctly – EdChum Jun 23 '16 at 17:01
  • Oooh, sorry I misunderstood what you were saying. I didn't realize the row indices were used in the assignment like that, I thought it would work more like numpy and assign element by element. Of course that makes sense now that I think about it. Thanks for the clarification! – Emma Jun 23 '16 at 18:30