pandas initialize dataframe column cells as empty lists

Question

I need to initialize the cells in a column of a DataFrame to lists.

df['some_col'] = [[] for _ in no_of_rows]

I am wondering is there a better way to do that in terms of time efficiency?

You have accepted an answer that offers a solution 3x slower than your starting point. — Stefan, May 24 '16 at 14:08
@Stefan it seems that you are correct, as `apply(list)` is indeed slightly slower than my old code. — daiyue, May 24 '16 at 14:15
So as you can see below you can get a tiny bit faster using `itertools`, but I think you're actually quite good already because I don't see a faster way to add the column than the standard method, but perhaps someone comes up with some magic.. — Stefan, May 24 '16 at 14:18

score 5 · Accepted Answer · answered May 24 '16 at 14:02

Since you are looking for time efficiency, below some benchmarks. I think list comprehension is already quite fast to create the empty list of list objects, but you can squeeze out a marginal improvement using itertools.repeat. On the insert piece, apply is 3x slower because it loops:

import pandas as pd
from itertools import repeat
df = pd.DataFrame({"A":np.arange(100000)})

%timeit df['some_col'] = [[] for _ in range(len(df))]
100 loops, best of 3: 8.75 ms per loop

%timeit df['some_col'] = [[] for i in repeat(None, len(df))]
100 loops, best of 3: 8.02 ms per loop

%%timeit 
df['some_col'] = ''
df['some_col'] = df['some_col'].apply(list)
10 loops, best of 3: 25 ms per loop

jezrael · Answer 2 · 2016-05-24T13:52:00.933

4

Try apply:

df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)

Sample:

df1 = pd.DataFrame({'a': pd.Series([1,2])})
print (df1)
   a
0  1
1  2

df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
print (df1)
   a some_col
0  1       []
1  2       []

edited May 24 '16 at 13:52

answered May 24 '16 at 13:40

jezrael

629,482
62
918
895

How is this better in terms of time efficiency? – Stefan May 24 '16 at 14:11
Hmmm, I think it is not better in terms of time efficiency. But it is up to OP which answer sign as accepted. Maybe you prefer me, because I was first, maybe because like it. But maybe in few seconds change his opinion. I dont know. – jezrael May 24 '16 at 14:14
Also note that `lambda: []` will be faster than `list`. – hilberts_drinking_problem May 24 '16 at 14:17
Just asking because the question was about time efficiency, so it's a good thing if the answer tries to do so as well. – Stefan May 24 '16 at 14:17
@Stefan And now maybe your solution will be accepted. – jezrael May 24 '16 at 14:18
@Stefan what about `xrange` ? `%timeit df['some_col'] = [[] for _ in xrange(len(df))]` . Unfortunately I cannot test it, because it works in python 2. – jezrael May 24 '16 at 14:42
`xrange` has been renamed to `range` in python3 - http://stackoverflow.com/questions/15014310/why-is-there-no-xrange-function-in-python3 and GvR: https://docs.python.org/3/whatsnew/3.0.html#views-and-iterators-instead-of-lists – Stefan May 24 '16 at 19:52

pandas initialize dataframe column cells as empty lists

2 Answers2