47

What is the key difference between Fork/Join and Map/Reduce?

Do they differ in the kind of decomposition and distribution (data vs. computation)?

S.L. Barth
  • 7,954
  • 71
  • 47
  • 62
hotzen
  • 2,622
  • 25
  • 39

2 Answers2

42

One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.

F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

d4nyll
  • 9,170
  • 5
  • 43
  • 59
tucuxi
  • 15,614
  • 2
  • 36
  • 70
  • 10
    More specifically, F-J allows workers to steal subtasks from each others' queues. This is not possible if the worker threads are on different machines (and thus do not have shared memory.) – finnw Jan 21 '11 at 12:24
  • 2
    According to the [MapReduce Wikipedia entry](http://en.wikipedia.org/wiki/MapReduce), M-R is not necessarily restricted to a single tier of forked tasks. – Tom Crockett Mar 07 '13 at 01:35
  • what's the difference between fork/join & mapreduce outside the context of Java? – user2001850 Jan 15 '17 at 22:09
17

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

But there is a lot more to read there if you're up for it.

Per Quested Aronsson
  • 9,570
  • 8
  • 47
  • 70