6

I have weird observation about scalaz-streams sinks. They are working slow. Does anyone know why is that? And is there any way to improve the performance?

here are relevant parts of my code: version without sink

//p is parameter with type p: Process[Task, Pixel]

def printToImage(img: BufferedImage)(pixel: Pixel): Unit = {
  img.setRGB(pixel.x, pixel.y, 1, 1, Array(pixel.rgb), 0, 0)
}
val image = getBlankImage(2000, 4000)
val result = p.runLog.run
result.foreach(printToImage(image))

this takes ~7s to execute

version with sink

//p is the same as before

def printToImage(img: BufferedImage)(pixel: Pixel): Unit = {
  img.setRGB(pixel.x, pixel.y, 1, 1, Array(pixel.rgb), 0, 0)
}

//I've found that way of doing sink in some tutorial
def getImageSink(img: BufferedImage): Sink[Task, Pixel] = {
  //I've tried here Task.delay and Task.now with the same results
  def printToImageTask(img: BufferedImage)(pixel: Pixel): Task[Unit] = Task.delay {
    printToImage(img)(pixel)
  }
  Process.constant(printToImageTask(img))
}



val image = getBlankImage(2000, 4000)
val result = p.to(getImageSink(image)).run.run

this one takes 33 seconds to execute. I am totally confused here because of that significant difference.

Jacek Laskowski
  • 64,943
  • 20
  • 207
  • 364
user2963977
  • 562
  • 4
  • 16

1 Answers1

7

In second case you are allocating Task for each pixel, and instead of directly calling printToImage you do it through Task, and it's much more steps in a call-chain.

We use scalaz-stream a lot, but I strongly believe that it's overkill to use it for this type problems. Code running inside Process/Channel/Sink should much more complicated than simple variable assignment/update.

We use Sinks to write data from stream into databases (Cassandra) and we use batching, it's to high overhead to write individual rows. Process/Sinks is super convenient abstraction, but for more high level workflows. When it's easy to write for-loop I would suggest to write for-loop.

Eugene Zhulenev
  • 9,469
  • 2
  • 28
  • 38