# import modules, set seed
import random
import numpy as np
import pandas as pd
random.seed(42)
The problem
I am having a dataframe df
. Its rows contain values which are input to a function, producing variable number of outputs. The maximum number of outputs is not known a priori. The outputs are to be put in the same row as the function, creating new columns if necessary. Unfilled cells should be filled with NaN
s.
Reproducible setup
Let's create a dataframe:
df = pd.DataFrame(pd.Series([random.randint(1,10) for _ in range(5)]),columns=['randomnums'])
This looks like:
What have I done
Created a dataframe (auxiliarydf
) with the values I want to fill the rows of the to-be created columns of the original df
, using from_dict(), apply(), a lambda function, dict & list comprehension:
auxiliarydf = pd.DataFrame.from_dict(
{index: pd.Series(array) for index, array in zip(
df.index,
df['randomnums'].apply(
lambda r:
# here I apply some function on the row.
# The output will be a list of variable length
# for the shake of an example:
np.array([x for x in range(r)])))},
orient='index')
auxiliarydf
will be:
concat() df
with auxiliarydf
:
pd.concat([df, auxiliarydf], axis=1)
Result:
Which is as expected.
The question
Is there an easier, maybe built-in Pandas function to do the process above? It works, but it seems like a problem which appears with enough frequency to expect a neater solution.
Colab notebook available here with the code above.