15

As I have understood so far: Javascript is single threaded. If you defer the execution of some procedure, you just schedule it (queue it) to be run next time the thread is free. But Async.js defines two methods: Async::parallel & Async::parallelLimit, and I quote:

  • parallel(tasks, [callback])

Run an array of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback...

  • parallelLimit(tasks, limit, [callback])

The same as parallel only the tasks are executed in parallel with a maximum of "limit" tasks executing at any time.

As far as to my understanding of English, when you say: "doing tasks in parallel" means doing them at the same time - simultaneously.

How may Async.js execute tasks in parallel in a single thread? Am I missing something.

Zirak
  • 35,198
  • 12
  • 75
  • 89
tikider
  • 540
  • 3
  • 12
  • 1
    How do operating systems simulate multitasking on single-processor machines? The answer is the same: time-slicing. – Frédéric Hamidi Sep 26 '13 at 09:13
  • I am not too familiar with OSs internal, but javascript which runs in a single thread has the event loop that constantly monitors for new events and execute any bound procedure to them ONE BY ONE. Nothing is done simultaneously. Correct me if I am wrong. – tikider Sep 26 '13 at 09:17
  • You are right. There is only the illusion that things happen simultaneously, because short pieces of code running sequentially and yielding one to another is very similar to parallelism (from our point of view). – Frédéric Hamidi Sep 26 '13 at 09:19
  • 1
    All async does is let each function spawn processes/workers that _may- be run in parallel. If you just run synchronous code in those functions that's your fault, not async's ;) – Andreas Hultgren Sep 26 '13 at 09:20
  • @FrédéricHamidi so the naming of those methods is not totally descriptive of what that really do? – tikider Sep 26 '13 at 09:25
  • @tikider, indeed, these names are only representative of the visible behavior of what the methods actually do. – Frédéric Hamidi Sep 26 '13 at 09:26
  • @tikider I gathered the bits of our conversation in an answer and I'm going to delete all my comments afterward, because they making your question messy. – Leonid Beschastny Sep 26 '13 at 09:56
  • @LeonidBeschastny no problem! and thanks a lot. – tikider Sep 26 '13 at 09:59
  • @FrédéricHamidiyou said that it spawns child processes, and that's exactly how you achieve parralisem in node. So it does not sound like the name is.missleading. a – eran otzap May 15 '18 at 06:23

5 Answers5

14

How may Async.js execute tasks in parallel in a single thread? Am I missing something.

parallel runs all its tasks simultaneously. So if your tasks contain I/O calls (e.g. querying DB), they'll appear as if they've been processed in parallel.

how is this enabled in a single thread?! that is what I could not make sense of.

Node.js is non-blocking. So instead of handling all tasks in parallel, it switches from one task to another. So when the first task makes I/O call making itself idle, Node.js simply switches to processing another one.

I/O tasks spent most of its processing time waiting for the result of the I/O call. In blocking languages like Java, such a task blocks its thread while it waits for the results. But Node.js utilizes it's time to process another tasks instead of waiting.

so that means that if the inner processing of each task is asynchronous the thread is granted to each bit of this tasks regardless if anyone of them has finished or not until all have finished their bits?

Yes, it's almost as you said. Node.js starts processing the first task until it pauses to do an I/O call. At that moment, Node.js leaves it and grants its main thread to another task. So you may say that the thread is granted to each active task in turn.

christophetd
  • 3,614
  • 17
  • 32
Leonid Beschastny
  • 43,908
  • 9
  • 107
  • 112
  • 4
    This explains a lot for me. I was using async's each, but, according to my console logs, it wasn't reordering anything (as in one thing finishing before another in the same order in the array). **There is no true "parallel."** Only one thing ever happens at once. It is only when one pauses when another one steps in and waits for the pause to end. So, simply running `console.log` isn't enough to stop it. You could say that it is better time management, but I wouldn't call it parallel. – DaAwesomeP Mar 14 '15 at 04:17
  • As a side note, is there any way to achieve true parralisem in node, maybe with child processes ? – eran otzap May 15 '18 at 06:32
  • @eranotzap yes, it's possible. You could either use several independent node.js workers, or green threads provided by [`fibers` module](https://www.npmjs.com/package/fibers). You could spawn workers using either [child_process.fork()](https://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options) or [cluster.fork()](https://nodejs.org/api/cluster.html#cluster_cluster_fork_env). – Leonid Beschastny May 15 '18 at 07:46
  • @LeonidBeschastny, If for example you need to process a large data set would that require you to implement some sort of sharding mechanism, above those node or child processes ? – eran otzap May 15 '18 at 10:47
  • @eranotzap it depends of the actual task you're trying to perform. It this processing requires a lot of CPU work, then you could spawn an independent node.js worker using `child_process.fork()` to perform the whole operation in a separate process. Though, it's not a very good idea to perform CPU-bound processing using node.js, other instruments may better suit your needs. But if this processing consists mostly of I/O operation (api calls, db queries), then single Node.js process will handle it well. – Leonid Beschastny May 15 '18 at 11:36
  • @eranotzap the most common use case for spawning multiple Node.js processes is to take advantage of multiple CPU cores. Usually you only need as many workers as the number of CPU cores you want to use. As for long blocking CPU-bound operations, they should be avoided when working with Node.js wherever possible. – Leonid Beschastny May 15 '18 at 11:41
  • @LeonidBeschastny yes that is the use case. But i need to process a large data set. In Node. Because the entire system is written in node and i don't want to diverse from our dev stack. Now i have a large data set which needs to be processed in parallel How would that be done in node ? – eran otzap May 15 '18 at 12:26
4

Async.Parallel is well documented here: https://github.com/caolan/async#parallel

Async.Parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.

Ashish
  • 7,621
  • 11
  • 48
  • 89
2

The functions are not executed simultaneously, but when the first function handed off to an asynchronous task (e.g. setTimeout, network, ...), the second will start, even if the first function hasn't called the provided callback.

As for the number of parallel tasks: That depends on what you pick.

David
  • 614
  • 4
  • 12
1

As far as to my understanding of English, when you say: "doing tasks in parallel" means doing them at the same time - simultaneously.

Correct. And "simultaneously" means "there is at least one moment in time when two or more tasks are already started, but not yet finished".

How may Async.js execute tasks in parallel in a single thread? Am I missing something.

When some task stops for some reason (i.e. IO), async.js executes another task and continues first one later.

alex
  • 10,773
  • 2
  • 27
  • 41
1

Your doubts make perfect sense. It's been few years since you asked this question but I think it's worth to add few thinks to the existing answers.

Run an array of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback...

This sentence is not entirely correct. In fact it does wait for each function to have completed because it's impossible not to do so in JavaScript. Both function calls and function returns are synchronous and blocking. So when it calls any function it has to wait for it to return. What it doesn't have to wait for is the calling of the callback that was passed to that function.

Allegory

Some time ago I wrote a short story to demonstrate that very concept:

To quote a part of it:

“So I said: ‘Wait a minute, you tell me that one cake takes three and a half hours and four cakes take only half an hour more than one? It doesn’t make any sense!’ I though that she must be kidding so I started laughing.”
“But she wasn’t kidding?”
“No, she looked at me and said: ‘It makes perfect sense. This time is mostly waiting. And I can wait for many things at once just fine.’ I stopped laughing and started thinking. It finally started to get to me. Doing four pillows at the same time didn’t buy you any time, maybe it was arguably easier to organize but then again, maybe not. But this time it was something different. But I didn’t really know how to use that knowledge yet.”

Theory

I think it's important to emphasize that in single-threaded event loops you can never do more than one thing at once. But you can wait for many things at once just fine. And this is what happens here.

The parallel function from the Async module calls each of the function one by one, but each function has to return before the next one can be called, there is no way around it. The magic here is that the function doesn't really do its job before it returns - it just schedules some task, registers an event listener, passes some callback somewhere else, adds a resolution handler to some promise etc.

Then, when the scheduled task finishes, some handler that was previously registered by that function is executed, this in turns executes the callback that was originally passed by the Async module and the Async module knows that this one function has finished - this time not only in a sense that it returned, but also that the callback that was passed to it was finally called.

Examples

So, for example let's say that you have 3 functions that download 3 different URLs: getA(), getB() and getC().

We will write a mock of the Request module to simulate the requests and some delays:

function mockRequest(url, cb) {
  const delays = { A: 4000, B: 2000, C: 1000 };
  setTimeout(() => {
    cb(null, {}, 'Response ' + url);
  }, delays[url]);
};

Now the 3 functions that are mostly the same, with verbose logging:

function getA(cb) {
  console.log('getA called');
  const url = 'A';
  console.log('getA runs request');
  mockRequest(url, (err, res, body) => {
    console.log('getA calling callback');
    cb(err, body);
  });
  console.log('getA request returned');
  console.log('getA returns');
}

function getB(cb) {
  console.log('getB called');
  const url = 'B';
  console.log('getB runs request');
  mockRequest(url, (err, res, body) => {
    console.log('getB calling callback');
    cb(err, body);
  });
  console.log('getB request returned');
  console.log('getB returns');
}

function getC(cb) {
  console.log('getC called');
  const url = 'C';
  console.log('getC runs request');
  mockRequest(url, (err, res, body) => {
    console.log('getC calling callback');
    cb(err, body);
  });
  console.log('getC request returned');
  console.log('getC returns');
}

And finally we're calling them all with the async.parallel function:

async.parallel([getA, getB, getC], (err, results) => {
  console.log('async.parallel callback called');
  if (err) {
    console.log('async.parallel error:', err);
  } else {
    console.log('async.parallel results:', JSON.stringify(results));
  }
});

What gets displayed immediately is this:

getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getC called
getC runs request
getC request returned
getC returns

As you can see this is all sequential - functions get called one by one and the next one is not called before the previous one returns. Then we see this with some delays:

getC calling callback
getB calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

So the getC finished first, then getB and getC - and then as soon as the last one finishes, the async.parallel calls our callback with all of the responses combined and in correct order - in the order that the function was ordered by us, not in the order that those requests finished.

Also we can see that the program finishes after 4.071 seconds which is roughly the time that the longest request took, so we see that the requests were all in progress at the same time.

Now, let's run it with async.parallelLimit with the limit of 2 parallel tasks at most:

async.parallelLimit([getA, getB, getC], 2, (err, results) => {
  console.log('async.parallel callback called');
  if (err) {
    console.log('async.parallel error:', err);
  } else {
    console.log('async.parallel results:', JSON.stringify(results));
  }
});

Now it's a little bit different. What we see immediately is:

getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns

So getA and getB was called and returned but getC was not called at all yet. Then after some delay we see:

getB calling callback
getC called
getC runs request
getC request returned
getC returns

which shows that as soon as getB called the callback the Async module no longer has 2 tasks in progress but just 1 and can start another one, which is getC, and it does so immediately.

Then with another delays we see:

getC calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

which finishes the whole process just like in the async.parallel example. This time the whole process also took roughly 4 seconds because the delayed calling of getC didn't make any difference - it still managed to finish before the first called getA finished.

But if we change the delays to those ones:

const delays = { A: 4000, B: 2000, C: 3000 };

then the situation is different. Now async.parrallel takes 4 seconds but async.parallelLimit with the limit of 2 takes 5 seconds and the order is slightly different.

With no limit:

$ time node example.js
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getC called
getC runs request
getC request returned
getC returns
getB calling callback
getC calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

real    0m4.075s
user    0m0.070s
sys     0m0.009s

With a limit of 2:

$ time node example.js
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getB calling callback
getC called
getC runs request
getC request returned
getC returns
getA calling callback
getC calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]

real    0m5.075s
user    0m0.057s
sys     0m0.018s

Summary

I think the most important thing to remember - no matter if you use callbacks like in this case, or promises or async/await, is that in single-threaded event loops you can do only one thing at once, but you can wait for many things at the same time.

rsp
  • 91,898
  • 19
  • 176
  • 156