1

I wonder if the built in map function splits a list in x-chunks to apply the give function in parallel (Threads)?

The docu doesnt say anything about it but I would wonder why it is not implemented like this.

    def map_func(x):
    '''

   :param x: 
   :return: 2x
   >>> map_func(4)
   4
   '''
    return x * x


new_list = list(map(map_func, range(1, 2 ** 25)))
print(new_list)

From the task manager i cannot clearly see if its done by one thread or more.

Can someone explain please if its sequential and if so, why?

Anna Klein
  • 1,441
  • 20
  • 46

1 Answers1

1

It's sequential because map the higher-order function in general has to apply a function to data and return the results in the same order as the original data:

map(f, [1,2,3,4]) => [f(1), f(2), f(3), f(4)]

Making it parallel will introduce the need of synchronisation, which'll defeat the purpose of parallelism.

multiprocessing.Pool.map is a parallel version of the built-in map that will split the workload into chunks and correctly organise the results.

ForceBru
  • 36,993
  • 10
  • 54
  • 78
  • 1
    Your last sentence is slightly misleading. `map` can be run in parallel without making it useless. See [this](https://stackoverflow.com/questions/1704401/is-there-a-simple-process-based-parallel-map-for-python) question and answers. – ninesalt Feb 02 '19 at 20:27
  • @ninesalt, well, yeah, but it still has to stitch the individual results together somehow, and that's additional work. It also can't operate on infinite iterables because it [converts the iterable to `list`](https://github.com/python/cpython/blob/04b2a5eedac7ac0fecdafce1bda1028ee55e2aac/Lib/multiprocessing/pool.py#L375), which is also additional work – ForceBru Feb 02 '19 at 20:33
  • I'm pretty sure this is not the reason that `map()` does not run in parallel in cpython. A parallel version of `map()` could easily reserve space for the results up front and insert the results as they became ready. I think the reason is that cpython does not support running interpreted code in parallel. Multithreading in cpython still serializes code execution (with the GIL). – Roger Dahl Dec 29 '20 at 23:52
  • 1
    @RogerDahl, yeah, two years later, I'm not a fan of this answer either. That the results have to be in the same order doesn't mean that the mapping can't be parallel: for example, the main operations in Apache Spark are `map` and `reduce`, so parallel `map` is most definitely a thing - it's just not the built-in one – ForceBru Dec 30 '20 at 08:07