Parallel implementation slower than serial in Julia

Question

Why in the following Julia code the parallel implementation runs slower than the serial?

using Distributed

@everywhere function ext(i::Int64)
   callmop = `awk '{ sum += $1 } END { print sum }' infile_$(i)`
   run(callmop)
end

function fpar()
   @sync @distributed for i = 1:10
      ext(i)
   end
end

function fnopar()
   for i = 1:10
      ext(i)
   end
end

val, t_par, bytes, gctime, memallocs = @timed fpar()
val, t_nopar, bytes, gctime, memallocs = @timed fnopar()

println("Parallel: $(t_par) s. Serial: $(t_nopar) s")  
# Parallel: 0.448290379 s. Serial: 0.028704802 s

The files infile_$(i) contain a single column of real numbers. After some research I bumped into this post and this other post) that deal with similar problems. They seem a bit dated though, if one considers the speed at which Julia is been developed. Is there any way to improve this parallel section? Thank you very much in advance.

IMHO this is not the best example to test concurrency. Spawning an external process and having concurrent access to the same file (files/ file IO has locks, too) opens too many cans of worms at the same time and obscures the matter. Also the amount of work (10 numbers) is too small and the overhead for multi threading is not likely to be amortized. Better would have been to read the/more data into an array and then use that to compare single vs multi threaded execution. — BitTickler, Jun 20 '20 at 00:32
@BitTickler thanks for the comment. The above code is a simplification of the real problem: I absolutely need to call the external process with an input file (in fact, I call the quantum chemistry package [MOPAC](http://openmopac.net/)) and since all inputs are independent, it seems to me it is an embarrassingly parallel problem. I used the `awk` command because it does share a loose similarity with the situation I have. — panadestein, Jun 20 '20 at 00:39

Przemyslaw Szufel · Accepted Answer · 2020-06-20T00:42:19.420

Your code is correct but you measure the performance incorrectly.

Note that for this use case scenario (calling external processes) you should be fine with green threads - no need to distribute the load at all!

When a Julia function is executed for the first time it is being compiled. When you execute it on several parallel process all of them need to compile the same piece of code.

On top of that the first @distribution macro run also takes a long time to compile. Hence before using @timed you should call once both the fpar and nofpar functions.

Last but not least, there is no addprocs in your code but I assume that you have used -p Julia option to add the worker processes to your Julia master process. By the way you did not mention how many of the worker processes you have.

I usually test code like this:

@time fpar()
@time fpar()
@time fnopar()
@time fnopar()

The first measure is to understand the compile time and the second measure to understand the running time.

It is also worth having a look at the BenchmarkTools package and the @btime macro.

Regarding performance tests @distributed has a significant communication overhead. In some scenarios this can be mitigated by using SharedArrays in others by using Thread.@threads. However in your case the fastest code would be the one using green threads:

function ffast()
   @sync for i = 1:10
      @async ext(i)
   end
end

Thanks a lot for the quick answer, very interesting remarks. Indeed, I use the `-p` flag to set the number of processes. So what would you suggest to better test performance here? — panadestein, Jun 20 '20 at 00:10
You were absolutely right, if one excludes compilation time the parallel implementation is faster. However, using 8 processes the performance improvement is rather discrete I would say. Parallel 0.014887 seconds versus Serial 0.031449 seconds. I assume this is as good as it can get if one considers the overheads? — panadestein, Jun 20 '20 at 00:29
There are many issues to distributed performance e.g. communication times are expensive. Try my code proposal with green threads and tell me what you get :-) — Przemyslaw Szufel, Jun 20 '20 at 00:44
Result: `Parallel: 0.010793981 s. Serial: 0.031697373 s. Green: 0.007791053`. Green threads are the best solution. Thanks a lot :) — panadestein, Jun 20 '20 at 00:49

Parallel implementation slower than serial in Julia

1 Answers1