I am running PySpark with Spark 2.0 to aggregate data. Below is the raw Dataframe (df) as received in Spark.
DeviceID TimeStamp IL1 IL2 IL3 VL1 VL2 VL3
1001 2019-07-14 00:45 2.1 3.1 2.25 235 258 122
1002 2019-07-14 01:15 3.2 2.4 4.25 240 250 192
1003 2019-07-14 01:30 3.2 2.0 3.85 245 215 192
1003 2019-07-14 01:30 3.9 2.8 4.25 240 250 192
Now I want to apply groupby
logic by DeviceID
. There are several posts there in StackOverflow. Particularly, This and this links are of point of interest. With the help of those posts I created the following script
from pyspark.sql import functions as F
groupby = ["DeviceID"]
agg_cv = ["IL1","IL2","IL3","VL1","VL2","VL3"]
func = [min,max]
expr_cv = [F.f(F.col(c)) for f in func for c in agg_cv]
df_final = df_cv_filt.groupby(*groupby).agg(*expr_cv)
The above code is showing error as
Columns are not iterable
Not able to understand why such error is coming. When I am using the following code
from pyspark.sql.functions import min, max, col
expr_cv = [f(col(c)) for f in func for c in agg_cv]
Then the above code is running fine.
My question is: how can I fix the above mentioned error.