1

I have a dataframe column with numerical values separated by commas(,). I am trying to find the sum of the values in each cell of that column and create a new column.

Input: df:

       id Scores
       1  10,12
       2  11
       3  1,2,3

Output required:

      id Scores Sum
      1  10,12   22
      2  11      11 
      3  1,2,3   6

Code: I have tried writing a custom function, so that I can make use of apply function in python

def sum_of_number(listx):
    a=[]
    n=0
    if len(listx)==1:
        a=listx
        n=a
    else:
        a=str(listx).split(',')
        for i in range(0,len(a)):
            n=n+int(a[i])
    return n

df['Sum']=df.Scores.apply(sum_of_number)

But I am not able to get the intended result.


Edit: Thanks to DeepSpace for providing the solution

df['Sum'] = df['Scores'].apply(lambda x: sum(map(float, x.split(','))))

Can the same be applied to get median

      id Scores Median
      1  10,12   11
      2  11      11 
      3  1,2,3   2
Sam
  • 276
  • 3
  • 12

1 Answers1

2

You can use a much simpler code by combining sum, map and int (or float, based on your needs):

import pandas as pd

df = pd.DataFrame({'Scores': ['10,12', '11', '1,2,3']})
df['Sum'] = df['Scores'].apply(lambda x: sum(map(int, x.split(','))))
# or df['Sum'] = df['Scores'].apply(lambda x: sum(map(float, x.split(','))))
print(df)

#    Scores  Sum
#  0  10,12   22
#  1  11      11
#  2  1,2,3    6

The above assumes the values are strings. If the values are actual list of integers it is even easier:

import pandas as pd

df = pd.DataFrame({'Scores': [[10, 12], [11], [1, 2, 3]]})
df['Sum'] = df['Scores'].apply(sum)
print(df)

#        Scores  Sum
#  0   [10, 12]   22
#  1       [11]   11
#  2  [1, 2, 3]    6
DeepSpace
  • 65,330
  • 8
  • 79
  • 117
  • Thanks the code is working for the above case. But when my scores are in decimals it is throwing error (say 0.9,0.8 for id 1; 0.2 for id 2 and 0.3,0.2,0.5 for id 3) Error: ValueError: invalid literal for int() with base 10: '0.9' – Sam Nov 15 '17 at 15:42
  • @Sam so use `float` instead of `int` – DeepSpace Nov 15 '17 at 15:43
  • Thank you it helped :)...Suppose I want to calculate mean, median. Can you suggest how it can be done, since sum is a global function – Sam Nov 15 '17 at 15:46
  • @Sam I'm not sure what you mean. As you see you can provide any custom method to `apply`. How you implement it is up to you. You just need to make sure you understand your input. – DeepSpace Nov 15 '17 at 15:47
  • I have updated the question with the intended output. – Sam Nov 15 '17 at 15:58
  • @Sam https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python – DeepSpace Nov 15 '17 at 16:08
  • Hi @DeepSpace. I have read a data frame from excel, but the above code is showing `AttributeError: 'int' object has no attribute 'split` error. If I remove the single-digit values e.g 11 @index 1 in your example then the code works fine. Can you please suggest what i am doing wrong? – Prateek Sharma Mar 06 '21 at 18:53