Difference between Fork/Join and Map/Reduce

Question

What is the key difference between Fork/Join and Map/Reduce?

Do they differ in the kind of decomposition and distribution (data vs. computation)?

score 42 · Accepted Answer · edited Oct 24 '19 at 03:44

42

One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.

F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

edited Oct 24 '19 at 03:44

d4nyll

9,170
5
43
59

answered Mar 29 '10 at 14:28

tucuxi

15,614
2
36
70

10

More specifically, F-J allows workers to steal subtasks from each others' queues. This is not possible if the worker threads are on different machines (and thus do not have shared memory.) – finnw Jan 21 '11 at 12:24
2

According to the [MapReduce Wikipedia entry](http://en.wikipedia.org/wiki/MapReduce), M-R is not necessarily restricted to a single tier of forked tasks. – Tom Crockett Mar 07 '13 at 01:35
what's the difference between fork/join & mapreduce outside the context of Java? – user2001850 Jan 15 '17 at 22:09

score 17 · Answer 2 · answered Sep 09 '13 at 20:22

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

But there is a lot more to read there if you're up for it.

Difference between Fork/Join and Map/Reduce

2 Answers2