2

My question is quite simple, but I can't seem to find a proper solution. I can hack it with horrible code, I would like to find something elegant.

Here is my line of code :

    val summedDF = dataFrame.groupBy(colsNamesGroupBy.head, colsNamesGroupBy.tail : _*).sum(colsNamesSum:_*)

It does a groupBy on an array of column Names, and then sum a few columns.

Everything works fine, but I get columns with the folowing name : sum(xxxx). I would like to rename these on the go, maybe with a map operation, so I only keep the "xxxx" name.

Anyone has any idea ?

EDIT :

I'm trying something like that, but I get "cannot resolve symbol agg with this signature" :

    val summedDF = dataFrame.groupBy(colsNamesGroupBy.head, colsNamesGroupBy.tail : _*).agg(colsNamesSum.map(c => sum(c).as(c)))

2 Answers2

2

I would try something like that:

import org.apache.spark.sql.functions.{sum, col}

val aggregateExpr = colsNamesSum.map(c => sum(col(c)).as(c))

val summedDF = dataFrame.groupBy(colsNamesGroupBy.head, colsNamesGroupBy.tail : _*).agg(aggregateExpr.head, aggregateExpr.tail: _*)
0

You need to import

import org.apache.spark.sql.functions._

So you can use .agg

Haha TTpro
  • 3,921
  • 5
  • 28
  • 54