I'm looking to groupBy
agg
on the below Spark dataframe and get the mean, max, and min of each of the col1, col2, col3 columns
sp = spark.createDataFrame([['a',2,4,5], ['a',4,7,7], ['b',6,0,9], ['b', 2, 4, 4], ['c', 4, 4, 9]], ['id', 'col1', 'col2','col3'])
+---+----+----+----+
| id|col1|col2|col3|
+---+----+----+----+
| a| 2| 4| 5|
| a| 4| 7| 7|
| b| 6| 0| 9|
| b| 2| 4| 4|
| c| 4| 4| 9|
+---+----+----+----+
I've tried sp.groupBy('id').agg({'*':'max'})
to even just get the max on all of it but running into an error.
I've tried sp.groupBy('id').agg({'col1':['max', 'min', 'mean']})
but this is more of a traditional Pandas way to do it but it doesn't work.
id max(col1) max(col2) max(col3) min(col1) min(col2) min(col3) mean(col1) ..
a 4 7 7 2 4 5 3
b 6 4 9 2 0 4 4
c 4 4 9 4 4 9 4