I have a dataframe and I need to agregate it. I do it this way.
df
.groupBy($"key1", $"key2")
.agg(
sum($"total_in").as("total_in")
, sum($"val1_in").as("val1_in")
, sum($"val2_in").as("val2_in")
, sum($"val3_in").as("val3_in")
, sum($"val4_in").as("val4_in")
, sum($"total_out").as("total_out")
, sum($"val1_out").as("val1_out")
, sum($"val2_out").as("val2_out")
, sum($"val3_out").as("val3_out")
, sum($"val4_out").as("val4_out"))
It doesn't look all that bad, but if the number of operation in agg
becomes larger or a case when
or some other logic gets added to every sum
, this code will look really cumbersome.
I know it can be done. Maybe passing the columns to a function and iterating other them. Something like
.agg(
map(x => func())
)
but this is where I'm stuck. How do I pass only the columns I need? Is it possible to alias them in the same place?