0

I have the following data structure.

basket_series = df_train["basket"].head(10)
type(basket_series) 

Output: pandas.core.series.Series

0                [3]
1       [5, 3, 0, 3]
2       [3, 3, 1, 4]
3                [2]
4       [4, 4, 4, 4]
5       [4, 3, 4, 4]
6             [3, 4]
7    [4, 4, 1, 4, 4]
8       [1, 5, 2, 2]
9          [5, 5, 0]

I want to know how many numbers are per "list" -> but I think that "the list" is only interpreted as a string. My approaches were:

basket_series.size()  

output: 10

for x in basket_series: 
  print(len(x)) 

output: 3 12 12 3 12 12 6 15 12 9 Which seems to be the same as

basket_series.str.len()
for x in basket_series:
    print(len(list(x)))

So the problem is that it is seen as a string? Do you have any ideas?

Daniel
  • 11
  • 2
  • 1
    is this a column of *lists*, or just a column of *strings* that "look" like lists. – Willem Van Onsem Dec 12 '20 at 17:23
  • Yes it is represented as a string: Okay the current approach now looks like this ```for x in basket_series: x = ast.literal_eval(x) print(len(x))``` https://stackoverflow.com/questions/1894269/how-to-convert-string-representation-of-list-to-a-list – Daniel Dec 12 '20 at 17:30
  • 1
    it definitely depends on the dataframe you specified. pd.Series objects can hold lists. Check you types: for x in basket_series: print(type(list(x))) – Marc Dec 12 '20 at 17:31
  • 1
    @Daniel You can use `basket_series.map(ast.literal_eval).str.len()` – Shubham Sharma Dec 12 '20 at 17:32
  • 1
    if it is a string you could convert it into an array using json. import json; len(json.loads('[1,2,3,4]')) – Marc Dec 12 '20 at 17:32

1 Answers1

1

I guess the part of the answers are already covered in the comments but here is the code for completeness-

import pandas as pd
from ast import literal_eval

l = ['[3]',
     '[5, 3, 0, 3]',
     '[3, 3, 1, 4]',
     '[2]',
     '[4, 4, 4, 4]',
     '[4, 3, 4, 4]',
     '[3, 4]',
     '[4, 4, 1, 4, 4]',
     '[1, 5, 2, 2]',
     '[5, 5, 0]']

s = pd.Series(l)
print(s.map(literal_eval).apply(len))
0    1
1    4
2    4
3    1
4    4
5    4
6    2
7    5
8    4
9    3
sai
  • 1,483
  • 1
  • 4
  • 11