11

Ok, this might be a rather silly question, but what is the benefit of using parallel collections within an actor framework? That is, if I'm only dealing with one message at a time from an actor's mailbox, is there even a need for a parallel collection? Are parallel collections and actors mutually exclusive? What is a use case that would involve both?

Bruce Ferguson
  • 1,731
  • 2
  • 15
  • 20

2 Answers2

15

They solve different problems . Actors are good at solving task parallel problems. While parallel collections are good at solving data parallel problems. I don't think they are mutually exclusive - you can use parallel collections in actors and parallel collections containing actors.


Edit - quick test: Even something simple like a actor notification loop benefits.

In the following code we register a million actors with an actor registry which has to notify them of an event.

The non-parallel notification loop ( registry foreach {} ) takes an average of 2.8 seconds on my machine (4 core 2.5 GHz notebook). When the parallel collection loop ( registry.par.foreach {} ) is used it takes 1.2 seconds and uses all four cores.

import actors.Actor

case class Register(actor: Actor)
case class Unregister(actor: Actor)
case class Message( contents: String )

object ActorRegistry extends Actor{
  var registry: Set[Actor] = Set.empty

  def act() {
    loop{
      react{
        case reg: Register => register( reg.actor )
        case unreg: Unregister => unregister( unreg.actor )
        case message: Message => fire( message )
      }
    }
  }

  def register(reg: Actor) { registry += reg }

  def unregister(unreg: Actor) { registry -= unreg }

  def fire(msg: Message){
    val starttime = System.currentTimeMillis()

    registry.par.foreach { client => client ! msg } //swap registry foreach for single th

    val endtime = System.currentTimeMillis()
    println("elapsed: " + (endtime - starttime) + " ms")
  }
}

class Client(id: Long) extends Actor{
  var lastmsg = ""
  def act() {
    loop{
      react{
        case msg: Message => got(msg.contents)
      }
    }
  }
  def got(msg: String) {
    lastmsg = msg
  }
}

object Main extends App {

  ActorRegistry.start
  for (i <- 1 to 1000000) {
    var client = new Client(i)
    client.start
    ActorRegistry ! Register( client )
  }

  ActorRegistry ! Message("One")

  Thread.sleep(6000)

  ActorRegistry ! Message("Two")

  Thread.sleep(6000)

  ActorRegistry ! Message("Three")

}
Johan Prinsloo
  • 1,188
  • 9
  • 9
  • Thanks for your answer. I can see that one *CAN* use parallel collections within actors, but it doesn't seem like it would be any more advantageous than using a regular collection in this case. The thought of using a parallel collection that contains actors, on the other hand, seems like a useful thing. I like that idea. Thanks for giving me food for thought... – Bruce Ferguson Apr 10 '11 at 18:10
  • 1
    Thanks for creating some test code, that's great! I really appreciate it. – Bruce Ferguson Apr 11 '11 at 15:17
  • @Bruce. Notice there is another opportunity to use parallel collections here - the code above takes about 12 seconds to do the init loop. If we parallelized that it takes about 4 seconds ( (1 to 1000000).par.foreach{ i => ... } ). – Johan Prinsloo Apr 11 '11 at 17:07
2

Actors library in Scala is just one of the options, approaches to concurrency, among many (threads & locks, STM, futures/promises), and it's not supposed to be used for all kinds of problems, or to be combinable with everything (though actors and STM could make a good deal together). In some cases, setting up a group of actors (workers + a supervisor) or explicitly splitting up a task into portions, to feed them to the fork-join pool, is too cumbersome, and it's just way handier to call .par on an existing collection you're already using, and simply traverse it in a parallel, gaining a performance benefit almost for free (in terms of setup).

All in all, actors and parallel collections are different dimensions of the problem - actors is a concurrency paradigm, whilst parallel collections is just a useful tool that should be viewed not as a concurrency alternative, but rather as an augmentation of the collections toolset.

Vasil Remeniuk
  • 20,009
  • 5
  • 68
  • 81