So I have a dataframe of values that need to be summed together, then put into a Map[String,Long]
format to save into Cassandra.
The below code works, however I was wondering if a map could be created based on an abstract list of columns. (Looking at the source code for their functions only makes me more confused).
var cols = Array("key", "v1", "v2")
var df = Seq(("a",1,0),("b",1,0),("a",1,1),("b",0,0)).toDF(cols: _*)
val df1 = df.groupBy(col(cols(0))).
agg(map(lit(cols(1)), sum(col(cols(1))), lit(cols(2)), sum(col(cols(2)))) as "map")
This is my desired format for the dataframe & current given outcome with above code:
scala> df1.show(false)
+---+---------------------+
|key|map |
+---+---------------------+
|b |Map(v1 -> 1, v2 -> 0)|
|a |Map(v1 -> 2, v2 -> 1)|
+---+---------------------+
I would like to see a function that can returns the same as above but be able to place columns programmatically based on name. E.g.:
var columnNames = Array("v1", "v2")
df.groupBy(col(cols(0))).agg(create_sum_map(columnNames) as "map")
Is this even remotely possible in Spark?