I run at local mode and init with 2 partition.
when I use DataFrame.show(), it gets like this: INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 390 ms on localhost (2/2)
.
But when I use DataFrame.groupBy(), it gets so many tasks just like this:INFO scheduler.TaskSetManager: Finished task 83.0 in stage 15.0 (TID 691) in 644 ms on localhost (84/200)
.
My source code is here.
everyIResDF.show()
val resDF = everyIResDF
.groupBy("dz_id","dev_id","dev_type","time")
.avg("IRes")
resDF.show()
I want to know why groupBy() would cause this matter and how to solve it. Any help is useful.