1

I tried to modify the dataframe through function by looping through rows and return the modified dataframe. In the below code, I pass a dataframe 'ding' to function 'test' and create a new column 'C' by iterating through every row and return the modified dataframe. I expected the test_ding df to have 3 columns but could see only two columns. Any help is highly appreciated.

P.S. It could have other easier methods to accomplish this small task, but I am looking to iterate over rows and would like to see the modifications done on the dataframe to be reflected outside of the function

s1 = pd.Series([1,3,5,6,8,10,1,1,1,1,1,1])
s2 = pd.Series([4,5,6,8,10,1,7,1,6,5,4,3])

ding=pd.DataFrame({'A':s1,'B':s2})

def test(ding):
   for index,row in ding.iterrows():
       row['C']=row.A+row.B
return ding

test_ding=test(ding)

1 Answers1

2

You can use set_value on the original data frame instead of on row. set_value is pretty fast if you want to set values cell by cell:

def test(ding):
    for index, row in ding.iterrows():
        ding.set_value(index, 'C', row.A+row.B)
    return ding
​
test_ding=test(ding)

test_ding
#   A   B   C
#0  1   4   5.0
#1  3   5   8.0
#2  5   6   11.0
# ...
Community
  • 1
  • 1
Psidom
  • 171,477
  • 20
  • 249
  • 286
  • 1
    I'd also recommend itertuple to preserve varying dtypes. I can't say right now but I think it's quicker – piRSquared May 11 '17 at 22:53
  • Is there any specific reason, why we cannot assign in the way that I have posted? `row['C']=row.A+row.B` – Praveen Gupta Sanka May 11 '17 at 23:04
  • Because `iterrows` return a copy not a view, so assign a value to the row will not have any effiect on the original data frame. This is the doc *You should **never modify** something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.* – Psidom May 11 '17 at 23:35