Python 3 does inbuilt map function using parallel

Question

I wonder if the built in map function splits a list in x-chunks to apply the give function in parallel (Threads)?

The docu doesnt say anything about it but I would wonder why it is not implemented like this.

    def map_func(x):
    '''

   :param x: 
   :return: 2x
   >>> map_func(4)
   4
   '''
    return x * x


new_list = list(map(map_func, range(1, 2 ** 25)))
print(new_list)

From the task manager i cannot clearly see if its done by one thread or more.

Can someone explain please if its sequential and if so, why?

If your question is "why `map` doesn't process in parallel", well, that's either too broad or opinion based. If you want multiprocessing, import the module. — cs95, Feb 02 '19 at 20:35

ForceBru · Accepted Answer · 2020-12-30T08:03:05.120

1

It's sequential because map the higher-order function in general has to apply a function to data and return the results in the same order as the original data:

map(f, [1,2,3,4]) => [f(1), f(2), f(3), f(4)]

~~Making it parallel will introduce the need of synchronisation, which'll defeat the purpose of parallelism.~~

multiprocessing.Pool.map is a parallel version of the built-in map that will split the workload into chunks and correctly organise the results.

edited Dec 30 '20 at 08:03

answered Feb 02 '19 at 20:24

ForceBru

36,993
10
54
78

1

Your last sentence is slightly misleading. `map` can be run in parallel without making it useless. See [this](https://stackoverflow.com/questions/1704401/is-there-a-simple-process-based-parallel-map-for-python) question and answers. – ninesalt Feb 02 '19 at 20:27
@ninesalt, well, yeah, but it still has to stitch the individual results together somehow, and that's additional work. It also can't operate on infinite iterables because it [converts the iterable to `list`](https://github.com/python/cpython/blob/04b2a5eedac7ac0fecdafce1bda1028ee55e2aac/Lib/multiprocessing/pool.py#L375), which is also additional work – ForceBru Feb 02 '19 at 20:33
I'm pretty sure this is not the reason that `map()` does not run in parallel in cpython. A parallel version of `map()` could easily reserve space for the results up front and insert the results as they became ready. I think the reason is that cpython does not support running interpreted code in parallel. Multithreading in cpython still serializes code execution (with the GIL). – Roger Dahl Dec 29 '20 at 23:52
1

@RogerDahl, yeah, two years later, I'm not a fan of this answer either. That the results have to be in the same order doesn't mean that the mapping can't be parallel: for example, the main operations in Apache Spark are `map` and `reduce`, so parallel `map` is most definitely a thing - it's just not the built-in one – ForceBru Dec 30 '20 at 08:07

Python 3 does inbuilt map function using parallel

1 Answers1