1371

I am trying to understand threading in Python. I've looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I'm having trouble understanding them.

How do you clearly show tasks being divided for multi-threading?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
albruno
  • 13,787
  • 3
  • 15
  • 6
  • 34
    A good general discussion around this topic can be found in [Python's Hardest Problem](http://www.jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/) by Jeff Knupp. In summary, it seems threading is not for beginners. – Matthew Walker Sep 04 '13 at 07:12
  • 120
    haha, I tend to think that threading is for everyone, but beginners are not for threading :))))) – Bohdan Sep 17 '13 at 03:37
  • 49
    Just to flag that people should read all the answers as later ones are arguably better as new language features are taken advantage of... – Gwyn Evans Mar 01 '15 at 08:14
  • 5
    Remember to write your core logic in C and call it via ctypes to really take advantage of Python threading. – aaa90210 Jul 09 '15 at 00:52
  • 4
    I just wanted to add that [PyPubSub](http://pubsub.sourceforge.net/usage/usage_basic.html) is a great way to send and receive messages to control Thread flow – ytpillai Aug 07 '15 at 09:12
  • This type of question should also have the "historical value" yet "not a proper SO question" type of disclaimer on it – Hack-R Feb 24 '18 at 20:13
  • If you really need threading for performance reasons (e.g. for numerical calculations) please just write the botleneck code on C++ and make it a pymodule with pybind11. I don't see why people say Python will die because of that. – eusoubrasileiro Jan 14 '21 at 16:46

19 Answers19

1518

Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with Python with map and pool.

The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I'll summarize below - it ends up being just a few lines of code:

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

Which is the multithreaded version of:

results = []
for item in my_array:
    results.append(my_function(item))

Description

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Enter image description here


Implementation

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) I/O):

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

And the timing results:

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

Passing multiple arguments (works like this only in Python 3.3 and later):

To pass multiple arrays:

results = pool.starmap(function, zip(list_a, list_b))

Or to pass a constant and an array:

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

If you are using an earlier version of Python, you can pass multiple arguments via this workaround).

(Thanks to user136036 for the helpful comment.)

Community
  • 1
  • 1
philshem
  • 22,161
  • 5
  • 54
  • 110
  • 92
    This is only lacking votes because it is so freshly posted. This answer works beautifully and demonstrates the 'map' functionality which gives a much easier to understand syntax than the other answers here. – idle Feb 21 '15 at 07:51
  • 1
    @jeffcrowe This simple parallelization has realy improved while keeping the structure simple and clean. I was surprised such a useful method hadn't been posted in this thread. – philshem Mar 03 '15 at 08:21
  • 4
    In case you want to pass multiple arguments read this: http://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments/28975239#28975239 – user136036 Mar 10 '15 at 23:05
  • Or the earlier reply directly above yours (@user136036) that says the same thing: http://stackoverflow.com/a/5443941/2379433 – Mike McKerns Apr 22 '15 at 19:22
  • 5
    Is this really using multiple cores at the same time? These https://docs.python.org/dev/library/multiprocessing.html and http://chriskiehl.com/article/parallelism-in-one-line/ seem to suggest multiprocessing.dummy doesn't actually use multiple cores at once, but interleaves using different cores. At least that's what it seems like to me at this point. – user3731622 Jul 27 '15 at 18:45
  • 25
    Is this even threads and not processes? It seems like it attempts to multiprocess != multithread – AturSams Jul 29 '15 at 11:02
  • 77
    By the way, guys, you can write `with Pool(8) as p: p.map( *whatever* )` and get rid of bookkeeping lines too. –  Sep 03 '15 at 07:06
  • 11
    @BarafuAlbino: Useful as that is, it's probably worth noting that this [only works in Python 3.3+](https://stackoverflow.com/questions/25968518/python-multiprocessing-lib-error-attributeerror-exit). – fuglede Oct 19 '15 at 10:38
  • 1
    Actually the downside of this method is that is not working inside a class based function – Stéphane Nov 09 '15 at 16:57
  • 1
    Good answer, but shouldn't it be `results.append(my_function(item))` instead of `results += my_function(item)`? I would also put `results = []` at the beginning to make it clear that results is a list; as it stands, it looks like results is a number. – Brian McCutchon Nov 16 '16 at 03:09
  • @BrianMcCutchon I added the `results = []` but wouldn't the `.append()` actually append lists, instead of extending? http://stackoverflow.com/a/252711/2327328 – philshem Nov 17 '16 at 09:59
  • @philshem No, `.append()` is for adding an individual element to a list; `+=` is for concatenating a list onto the end of another. Try it in the interpreter. `spam = []; spam += 5` results in an error, but `spam.append(5)` works fine. – Brian McCutchon Nov 19 '16 at 05:53
  • @BrianMcCutchon I had imagined in this case the the response of the function are an array. – philshem Nov 19 '16 at 11:58
  • @philshem But `map` does not have that supposition. Therefore, the two pieces of code are **not** equivalent. – Brian McCutchon Nov 22 '16 at 19:18
  • Don't forget to close then terminate pool, otherwise threads may be kept alive `pool.map(any_function, iterable)` `pool.close()` `pool.terminate()`. – Thibaud David May 19 '17 at 11:56
  • 3
    I dont get this point, `from multiprocessing.dummy import Pool as ThreadPool `, here `Pool` is imported as alias `ThreadPool`, but in `with`, how you are able to do `with Pool(4) as pool:`, this should be an error!!! – NoobEditor May 31 '17 at 10:50
  • @NoobEditor this was a recent edit by another user. Please see the edit history and edit back if you think it doesn't work. I think what they meant was an implicit way of closing the pool (similar to opening files) – philshem May 31 '17 at 13:58
  • @NoobEditor thanks for pointing this out. I made the edit myself. – philshem May 31 '17 at 14:03
  • Will the order of urls be preserved in results, or will results be ordered by the threads that returned first? – aberger Jun 22 '17 at 18:37
  • 10
    How can you leave this answer and not mention that this is only useful for I/O operations? This only runs on a single thread which is useless for most cases, and is actually slower than just doing it the normal way – Frobot Aug 14 '17 at 07:03
  • @Frobot - not only for I/O. `urls = [ numpy.random.random((5000000, 1)), numpy.random.random((5000000, 1)), numpy.random.random((5000000, 1)), numpy.random.random((5000000, 1)) ]; result = pool.map(sum, urls)` (needs to be wrapped in def_main), and remove .dummy – philshem Sep 29 '17 at 09:30
  • how do I call a function from inside the pool.map? – janjackson Nov 12 '17 at 15:46
  • Doesnt work unless i run it outside of `__main__`. Maybe it can work inside a function, did not test – Nyxynyx Nov 30 '17 at 02:03
  • what if `function` raises an exception, how manage a non-blocking behavior? – enneppi Feb 19 '18 at 11:33
  • 1
    Saved my day!! First thread experience in python and it feels great!! Thanks – rodrigorf Mar 29 '18 at 03:25
  • 1
    Wonderful answer and reference links ! Thank you so much ! – Jia May 15 '18 at 15:27
  • Great post! The link to [Parallelism in One Line](https://chriskiehl.com/article/parallelism-in-one-line) didn't work for me, and couldn't edit the post so adding the correct link here. – AndOs Jan 02 '19 at 20:20
  • @philshem Can you clarify what if the input is lines from a file? It is similar to your post, but instead of array, I read lines from a file. then, I want to write the results in a file. – user9371654 Mar 01 '19 at 09:48
  • @user9371654 - please post as a new question – philshem Mar 01 '19 at 10:04
  • @philshem can you please check my question here: [link](https://stackoverflow.com/questions/54942503/cant-read-write-to-files-using-multithreading-in-python/54943940?noredirect=1#comment96651673_54943940). The answer provided is too complex and I am unable to add on it or customise it. I need a clean and simple way like this but only read/write from files. – user9371654 Mar 01 '19 at 13:17
  • Any way of getting the id number into the function with this approach? – Usama Ilyas Jul 15 '20 at 16:41
731

Here's a simple example: you need to try a few alternative URLs and return the contents of the first one to respond.

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, to put its contents on the queue; each thread is a daemon (won't keep the process up if the main thread ends -- that's more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they're daemon threads).

Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn't use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there's a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work's results, by the way, and they're intrinsically threadsafe, so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.

Alex Martelli
  • 762,786
  • 156
  • 1,160
  • 1,345
  • 10
    Thanks again, MartelliBot. I've updated the example to wait for all to urls to respond: import Queue, threading, urllib2 q = Queue.Queue() urls = '''http://www.a.com http://www.b.com http://www.c.com'''.split() urls_received = 0 def get_url(q, url): req = urllib2.Request(url) resp = urllib2.urlopen(req) q.put(resp.read()) global urls_received urls_received +=1 print urls_received for u in urls: t = threading.Thread(target=get_url, args = (q,u)) t.daemon = True t.start() while q.empty() and urls_received < len(urls): s = q.get() print s – htmldrum Jan 07 '13 at 02:23
  • 3
    @JRM: if you look at the next answer below, I think that a better way to wait until the threads are finished would be to use the `join()` method, since that would make the main thread wait until they're done without consuming processor by constantly checking the value. @Alex: thanks, this is exactly what I needed to understand how to use threads. – krs013 May 31 '13 at 05:33
  • 6
    For python3, replace 'import urllib2' with 'import urllib.request as urllib2'. and put parentheses in the print statement. – Harvey Sep 21 '13 at 01:50
  • Good link [here](http://www.ibm.com/developerworks/aix/library/au-threadingpython/) – Joel Vroom Oct 18 '13 at 15:41
  • 5
    For python 3 replace `Queue` module name with `queue`. Method name is the same. – JSmyth Jan 05 '14 at 21:13
  • 2
    I note that solution will only print out one of the pages. To print both pages from the queue simply run the command again: `s = q.get()` `print s` @krs013 You don't need the `join` because Queue.get() is blocking. – Tom Anderson Jan 07 '14 at 05:50
  • @TomAnderson `Queue.get()` is not blocking by default, you must provide an argument `Queue.get([block[, timeout]])`. So it should be `Queue.get(True)` in order to be blocking. [Source](http://docs.python.org/2/library/queue.html#Queue.Queue.get) – Tulio F. Mar 15 '14 at 18:40
  • 1
    @tofs: Wrong, because `block` is `True` by default, so Queue.get() ***is*** blocking. Source: the very one you cited ;) – MestreLion Jun 08 '14 at 17:37
  • @MestreLion Props to you ;) – Tulio F. Jun 18 '14 at 02:25
  • @philshem, to get the **first** to respond, the modern alternative is actually `concurrent.futures.as_completed` (stdlib in Py3.2 and better, pypi backport for earlier Python versions). – Alex Martelli Mar 18 '15 at 16:35
  • Jython (yes, a flavour of Python) gives you true multithreading with no GIL! No idea about times, benchmarks, etc. – mike rodent Aug 30 '15 at 14:03
  • @htmldrum , thanks for your code but the " urls_received +=1" is not thread-safe – Steven Du Dec 15 '15 at 01:50
  • using this example, if say theurls has a size of 100 million URLs, will it still work without any problem ? – Noor Apr 19 '17 at 09:34
  • @Alex Martelli can you please clarify to me how many threads will be running in your code example? – user9371654 Aug 02 '18 at 15:00
267

NOTE: For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations).

However, if you are merely looking for interleaving (or are doing I/O operations that can be parallelized despite the global interpreter lock), then the threading module is the place to start. As a really simple example, let's consider the problem of summing a large range by summing subranges in parallel:

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

Note that the above is a very stupid example, as it does absolutely no I/O and will be executed serially albeit interleaved (with the added overhead of context switching) in CPython due to the global interpreter lock.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Michael Aaron Safyan
  • 87,518
  • 14
  • 130
  • 194
  • 16
    @Alex, I didn't say it was practical, but it does demonstrate how to define and spawn threads, which I think is what the OP wants. – Michael Aaron Safyan May 17 '10 at 04:39
  • 7
    While this does show how to define and spawn threads, it actually does not sum the subranges in parallel. `thread1` runs until it's completed while the main thread blocks, then the same thing happens with `thread2`, then the main thread resumes and prints out the values they accumulated. – martineau Feb 17 '14 at 19:32
  • Shouldn't that be `super(SummingThread, self).__init__()`? As in http://stackoverflow.com/a/2197625/806988 – James Andres Mar 06 '14 at 09:37
  • @JamesAndres, assuming that no one inherits from "SummingThread", then either one works fine; in such a case super(SummingThread, self) is just a fancy way to look up the next class in the method resolution order (MRO), which is threading.Thread (and then subsequently calling __init__ on that in both cases). You are right, though, in that using super() is better style for current Python. Super was relatively recent at the time that I provided this answer, hence calling directly to the super class rather than using super(). I'll update this to use super, though. – Michael Aaron Safyan Mar 06 '14 at 11:16
  • 15
    WARNING: Don't use multithreading in tasks like this! As was shown by Dave Beazley: http://www.dabeaz.com/python/NewGIL.pdf, 2 python threads on 2 CPUs carry out a CPU-heavy task 2 times SLOWER than 1 thread on 1 CPU and 1.5 times SLOWER than 2 threads on 1 CPU. This bizarre behavior is due to mis-coordination of efforts between OS and Python. A real-life use case for threads is an I/O heavy task. E.g. when you perform read/writes over network, it makes sense to put a thread, waiting for data to be read/written, to background and switch CPU to another thread, which needs to process data. – Boris Burkov May 15 '14 at 23:13
  • Thanks, everyone, for pointing out the inherent issues with the global interpreter lock. I've updated this post to reflect that. – Michael Aaron Safyan May 29 '14 at 08:04
  • How about super(self.__class__, self).__init__() instead of super(SummingThread, self).__init__()? It allows the code to be reused in another thread class without the risk of referring to the wrong class. – JohnMudd Sep 21 '14 at 02:10
  • @Bob furthermore, this task is RAM memory bound, throwing more non-NUMA CPUs at it does nothing: RAM IO is the bottleneck. – Ciro Santilli新疆棉花TRUMP BAN BAD Dec 14 '15 at 19:35
  • As of 2019, I think it's much more readable to have `thread1 = threading.Thread(target=function_name, args=(arg1, arg2, ...))` and the work function defined beforehand. It makes threads one-liner definitions instead of having to create an entire class. – Guimoute Nov 15 '19 at 09:29
105

Like others mentioned, CPython can use threads only for I/O waits due to GIL.

If you want to benefit from multiple cores for CPU-bound tasks, use multiprocessing:

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Kai
  • 1,318
  • 1
  • 12
  • 13
  • 33
    could you explain a little what this does? – pandita Sep 14 '13 at 16:01
  • 5
    @pandita: the code creates a process, then starts it. So now there's two things happening at once: the main line of the program, and the process that's starting with the target, `f` function. In parallel, the main program now just waits for the process to exit, `join`ing up with it. If the main part just exited, the subprocess might or might not run to completion, so doing a `join` is always recommended. – johntellsall Jul 02 '14 at 05:56
  • 1
    An expanded answer that includes the `map` function is here: http://stackoverflow.com/a/28463266/2327328 – philshem Mar 09 '15 at 08:15
  • 2
    @philshem Be careful b/c the link you posted is using a pool of threads (not processes) as mentioned here http://stackoverflow.com/questions/26432411/multiprocessing-dummy-in-python. However, this answer is using a process. I'm new to this stuff, but seems like (due to GIL) you will only get performance gains in specific situations when using multithreading in Python. However, using a pool of processes can take advantage of a multicore processor by have more than 1 core work on a process. – user3731622 Jul 27 '15 at 23:01
  • 3
    This is the best answer for actually doing something useful and taking advantage of multiple CPU cores – Frobot Aug 14 '17 at 07:19
  • It isn't necessarily so. There are many libraries written in the C language, such as `numpy`, that release the GIL and could do intensive numerical operations in a thread without blocking other threads from running. – Booboo Oct 29 '20 at 12:20
93

Just a note: A queue is not required for threading.

This is the simplest example I could imagine that shows 10 processes running concurrently.

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Douglas Adams
  • 1,460
  • 11
  • 7
  • 3
    Add the last quote to "Done to make it print "Done" – iChux Feb 11 '14 at 09:53
  • 1
    I like this example better than Martelli's, it's easier to play with. However, I would recommend that printNumber do the following, to make it a little bit clearer what's going on: it should save the randint to a variable before sleeping on it, and then the print should be changed to say "Thread" + str(number) + " slept for " + theRandintVariable + " seconds" – Nickolai Dec 17 '14 at 15:38
  • 1
    Is there a way to know when each thread has finished, as it finishes? – Matt Jan 29 '16 at 23:11
  • 1
    @Matt There are a few ways to do something like that, but it would depend on your needs. One way would be to update a singleton or some other publicly accessible variable that's being watched in a while loop and updated at the end of the thread. – Douglas Adams Feb 03 '16 at 00:06
  • 2
    No need for second `for` loop, you can call `thread.start()` in first loop. – Mark Mishyn Mar 23 '19 at 11:33
49

The answer from Alex Martelli helped me. However, here is a modified version that I thought was more useful (at least to me).

Updated: works in both Python 2 and Python 3

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
JimJty
  • 1,079
  • 11
  • 13
  • 6
    Why not just break on the exception? – Stavros Korokithakis Feb 09 '14 at 19:16
  • 1
    you could, just personal preference – JimJty Feb 10 '14 at 21:10
  • 1
    I haven't run the code, but don't you need to daemonize the threads? I think that after that last for-loop, your program might exit - at least it should because that's how threads should work. I think a better approach is not put the worker data in the queue, but put the output into a queue because then you could have a mainloop that not only **handles** information coming into the queue from the workers, but now it is also not threading, and you *know* it won't exit prematurely. – dylnmc Oct 12 '16 at 16:11
  • 1
    @dylnmc, that's outside my use case (my input queue is predefined). If you want to go your route, I would suggest looking at [celery](http://www.celeryproject.org/) – JimJty Oct 18 '16 at 17:45
  • @JimJty do you know why I'm getting this error: `import Queue ModuleNotFoundError: No module named 'Queue'` I am running python 3.6.5 some posts mention that in python 3.6.5 it is `queue` but even after I change it, still does not work – user9371654 Mar 01 '19 at 13:19
  • @user9371654 updated code to work in both python2 and python3 – JimJty Mar 03 '19 at 23:06
25

I found this very useful: create as many threads as cores and let them execute a (large) number of tasks (in this case, calling a shell program):

import Queue
import threading
import multiprocessing
import subprocess

q = Queue.Queue()
for i in range(30): # Put 30 tasks in the queue
    q.put(i)

def worker():
    while True:
        item = q.get()
        # Execute a task: call a shell program and wait until it completes
        subprocess.call("echo " + str(item), shell=True)
        q.task_done()

cpus = multiprocessing.cpu_count() # Detect number of cores
print("Creating %d threads" % cpus)
for i in range(cpus):
     t = threading.Thread(target=worker)
     t.daemon = True
     t.start()

q.join() # Block until all tasks are done
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
dolphin
  • 978
  • 9
  • 24
  • @shavenwarthog sure one can adjust the "cpus" variable depending on one's needs. Anyway, the subprocess call will spawn subprocesses and these will be allocated cpus by the OS (python's "parent process" does not mean "same CPU" for the subprocesses). – dolphin Jul 02 '14 at 02:28
  • 2
    you're correct, my comment about "threads are started on the same CPU as the parent process" is wrong. Thanks for the reply! – johntellsall Jul 02 '14 at 05:51
  • 1
    maybe worth noting that unlike multithreading which uses the same memory space, multiprocessing can not share variables / data as easily. +1 though. – fantabolous Jul 23 '14 at 09:07
25

Given a function, f, thread it like this:

import threading
threading.Thread(target=f).start()

To pass arguments to f

threading.Thread(target=f, args=(a,b,c)).start()
starfry
  • 7,737
  • 5
  • 57
  • 81
  • This is very straightforward. How do you ensure that the threads close when you are done with them? – cameronroytaylor May 05 '17 at 20:57
  • As far as I understand it, when the function exits the `Thread` object cleans up. See [the docs](https://docs.python.org/2/library/threading.html#thread-objects). There is an `is_alive()` method you can use to check a thread if you need to. – starfry May 06 '17 at 18:50
  • I saw the `is_alive` method, but I couldn't figure out how to apply it to the thread. I tried assigning `thread1=threading.Thread(target=f).start()` and then checking it with `thread1.is_alive()`, but `thread1` is populated with `None`, so no luck there. Do you know if there is any other way to access the thread? – cameronroytaylor May 06 '17 at 19:39
  • 5
    You need to assign the thread object to a variable and then start it using that varaible: `thread1=threading.Thread(target=f)` followed by `thread1.start()`. Then you can do `thread1.is_alive()`. – starfry May 07 '17 at 13:08
  • 2
    That worked. And yes, testing with `thread1.is_alive()` returns `False` as soon as the function exits. – cameronroytaylor May 08 '17 at 13:45
22

Python 3 has the facility of launching parallel tasks. This makes our work easier.

It has thread pooling and process pooling.

The following gives an insight:

ThreadPoolExecutor Example (source)

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

ProcessPoolExecutor (source)

import concurrent.futures
import math

PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Jeril
  • 5,663
  • 3
  • 39
  • 62
20

I saw a lot of examples here where no real work was being performed, and they were mostly CPU-bound. Here is an example of a CPU-bound task that computes all prime numbers between 10 million and 10.05 million. I have used all four methods here:

import math
import timeit
import threading
import multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor


def time_stuff(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{} seconds".format(t1 - t0))
    return wrapper

def find_primes_in(nmin, nmax):
    """
    Compute a list of prime numbers between the given minimum and maximum arguments
    """
    primes = []

    # Loop from minimum to maximum
    for current in range(nmin, nmax + 1):

        # Take the square root of the current number
        sqrt_n = int(math.sqrt(current))
        found = False

        # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration
        for number in range(2, sqrt_n + 1):

            # If divisible we have found a factor, hence this is not a prime number, lets move to the next one
            if current % number == 0:
                found = True
                break

        # If not divisible, add this number to the list of primes that we have found so far
        if not found:
            primes.append(current)

    # I am merely printing the length of the array containing all the primes, but feel free to do what you want
    print(len(primes))

@time_stuff
def sequential_prime_finder(nmin, nmax):
    """
    Use the main process and main thread to compute everything in this case
    """
    find_primes_in(nmin, nmax)

@time_stuff
def threading_prime_finder(nmin, nmax):
    """
    If the minimum is 1000 and the maximum is 2000 and we have four workers,
    1000 - 1250 to worker 1
    1250 - 1500 to worker 2
    1500 - 1750 to worker 3
    1750 - 2000 to worker 4
    so let’s split the minimum and maximum values according to the number of workers
    """
    nrange = nmax - nmin
    threads = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)

        # Start the thread with the minimum and maximum split up to compute
        # Parallel computation will not work here due to the GIL since this is a CPU-bound task
        t = threading.Thread(target = find_primes_in, args = (start, end))
        threads.append(t)
        t.start()

    # Don’t forget to wait for the threads to finish
    for t in threads:
        t.join()

@time_stuff
def processing_prime_finder(nmin, nmax):
    """
    Split the minimum, maximum interval similar to the threading method above, but use processes this time
    """
    nrange = nmax - nmin
    processes = []
    for i in range(8):
        start = int(nmin + i * nrange/8)
        end = int(nmin + (i + 1) * nrange/8)
        p = multiprocessing.Process(target = find_primes_in, args = (start, end))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

@time_stuff
def thread_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use a thread pool executor this time.
    This method is slightly faster than using pure threading as the pools manage threads more efficiently.
    This method is still slow due to the GIL limitations since we are doing a CPU-bound task.
    """
    nrange = nmax - nmin
    with ThreadPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

@time_stuff
def process_executor_prime_finder(nmin, nmax):
    """
    Split the min max interval similar to the threading method, but use the process pool executor.
    This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations.
    RECOMMENDED METHOD FOR CPU-BOUND TASKS
    """
    nrange = nmax - nmin
    with ProcessPoolExecutor(max_workers = 8) as e:
        for i in range(8):
            start = int(nmin + i * nrange/8)
            end = int(nmin + (i + 1) * nrange/8)
            e.submit(find_primes_in, start, end)

def main():
    nmin = int(1e7)
    nmax = int(1.05e7)
    print("Sequential Prime Finder Starting")
    sequential_prime_finder(nmin, nmax)
    print("Threading Prime Finder Starting")
    threading_prime_finder(nmin, nmax)
    print("Processing Prime Finder Starting")
    processing_prime_finder(nmin, nmax)
    print("Thread Executor Prime Finder Starting")
    thread_executor_prime_finder(nmin, nmax)
    print("Process Executor Finder Starting")
    process_executor_prime_finder(nmin, nmax)

main()

Here are the results on my Mac OS X four-core machine

Sequential Prime Finder Starting
9.708213827005238 seconds
Threading Prime Finder Starting
9.81836523200036 seconds
Processing Prime Finder Starting
3.2467174359990167 seconds
Thread Executor Prime Finder Starting
10.228896902000997 seconds
Process Executor Finder Starting
2.656402041000547 seconds
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
PirateApp
  • 4,149
  • 2
  • 34
  • 51
  • 1
    @TheUnfunCat no process executor s far better than threading for cpu bound tasks – PirateApp May 02 '18 at 03:52
  • 1
    Great answer dude. I can confirm that in Python 3.6 on Windows (at least) ThreadPoolExecutor does nothing good for CPU-heavy tasks. It's not utilizing cores for computation. Whereas ProcessPoolExecutor copies data into EVERY process it spawns, it's deadly for large matrices. – Anatoly Alekseev Jul 22 '18 at 07:33
  • 1
    Very useful example, but I don't understand how it ever worked. We need a `if __name__ == '__main__':` before the main call, otherwise the measurement spawns itself and prints [An attempt has been made to start a new process before...](https://stackoverflow.com/questions/55057957/an-attempt-has-been-made-to-start-a-new-process-before-the-current-process-has-f). – Stein Jul 15 '19 at 18:37
  • 1
    @Stein I believe that is only an issue on Windows, though. – AMC Jan 25 '20 at 18:47
19

Using the blazing new concurrent.futures module

def sqr(val):
    import time
    time.sleep(0.1)
    return val * val

def process_result(result):
    print(result)

def process_these_asap(tasks):
    import concurrent.futures

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = []
        for task in tasks:
            futures.append(executor.submit(sqr, task))

        for future in concurrent.futures.as_completed(futures):
            process_result(future.result())
        # Or instead of all this just do:
        # results = executor.map(sqr, tasks)
        # list(map(process_result, results))

def main():
    tasks = list(range(10))
    print('Processing {} tasks'.format(len(tasks)))
    process_these_asap(tasks)
    print('Done')
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main())

The executor approach might seem familiar to all those who have gotten their hands dirty with Java before.

Also on a side note: To keep the universe sane, don't forget to close your pools/executors if you don't use with context (which is so awesome that it does it for you)

Shubham Chaudhary
  • 36,933
  • 9
  • 67
  • 78
18

Most documentation and tutorials use Python's Threading and Queue module, and they could seem overwhelming for beginners.

Perhaps consider the concurrent.futures.ThreadPoolExecutor module of Python 3.

Combined with with clause and list comprehension it could be a real charm.

from concurrent.futures import ThreadPoolExecutor, as_completed

def get_url(url):
    # Your actual program here. Using threading.Lock() if necessary
    return ""

# List of URLs to fetch
urls = ["url1", "url2"]

with ThreadPoolExecutor(max_workers = 5) as executor:

    # Create threads
    futures = {executor.submit(get_url, url) for url in urls}

    # as_completed() gives you the threads once finished
    for f in as_completed(futures):
        # Get the results
        rs = f.result()
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Yibo
  • 233
  • 4
  • 7
17

For me, the perfect example for threading is monitoring asynchronous events. Look at this code.

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

You can play with this code by opening an IPython session and doing something like:

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

Wait a few minutes

>>> a[0] = 2
Mon = 2
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
dvreed77
  • 1,890
  • 1
  • 22
  • 36
  • 1
    AttributeError: 'Monitor' object has no attribute 'stop' ? – pandita Sep 14 '13 at 15:29
  • 5
    Aren't you blasting away CPU cycles while waiting for your event to happen? Not always a very practical thing to do. – mogul Sep 16 '13 at 16:58
  • 3
    Like mogul says, this will be constantly executing. At a minimum you could add in a short sleep, say sleep(0.1), which would probably significantly reduce cpu usage on a simple example like this. – fantabolous Jul 23 '14 at 09:10
  • 3
    This is a horrible example, wasting one core. Add a sleep at the very least but the proper solution is to use some signaling-mechanism. – PureW Dec 11 '15 at 10:51
  • I've read about GIL lately, i wonder how it is possible to input a[0] = 2 while the started thread is running which is a python CPU-bound task. Doesn't GIL prevent you from being able to run any other python code since it is once acquired by the Monitor thread? Or does python constantly switch between threads and GIL just prevents that no threads are executed at the same time but can be executed concurrently (but not parallely)? – iRestMyCaseYourHonor Dec 25 '20 at 16:44
  • I have found some answers to my former comment on the following links which I suggest to people curios about the switchs interpreter does and how GIL works https://pymotw.com/2/sys/threads.html and https://web.archive.org/web/20130320214138/http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm – iRestMyCaseYourHonor Dec 25 '20 at 19:51
13

Here is the very simple example of CSV import using threading. (Library inclusion may differ for different purpose.)

Helper Functions:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

Driver Function:

import_handler(csv_file_name)
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Chirag Vora
  • 234
  • 3
  • 10
12

I would like to contribute with a simple example and the explanations I've found useful when I had to tackle this problem myself.

In this answer you will find some information about Python's GIL (global interpreter lock) and a simple day-to-day example written using multiprocessing.dummy plus some simple benchmarks.

Global Interpreter Lock (GIL)

Python doesn't allow multi-threading in the truest sense of the word. It has a multi-threading package, but if you want to multi-thread to speed your code up, then it's usually not a good idea to use it.

Python has a construct called the global interpreter lock (GIL). The GIL makes sure that only one of your 'threads' can execute at any one time. A thread acquires the GIL, does a little work, then passes the GIL onto the next thread.

This happens very quickly so to the human eye it may seem like your threads are executing in parallel, but they are really just taking turns using the same CPU core.

All this GIL passing adds overhead to execution. This means that if you want to make your code run faster then using the threading package often isn't a good idea.

There are reasons to use Python's threading package. If you want to run some things simultaneously, and efficiency is not a concern, then it's totally fine and convenient. Or if you are running code that needs to wait for something (like some I/O) then it could make a lot of sense. But the threading library won't let you use extra CPU cores.

Multi-threading can be outsourced to the operating system (by doing multi-processing), and some external application that calls your Python code (for example, Spark or Hadoop), or some code that your Python code calls (for example: you could have your Python code call a C function that does the expensive multi-threaded stuff).

Why This Matters

Because lots of people spend a lot of time trying to find bottlenecks in their fancy Python multi-threaded code before they learn what the GIL is.

Once this information is clear, here's my code:

#!/bin/python
from multiprocessing.dummy import Pool
from subprocess import PIPE,Popen
import time
import os

# In the variable pool_size we define the "parallelness".
# For CPU-bound tasks, it doesn't make sense to create more Pool processes
# than you have cores to run them on.
#
# On the other hand, if you are using I/O-bound tasks, it may make sense
# to create a quite a few more Pool processes than cores, since the processes
# will probably spend most their time blocked (waiting for I/O to complete).
pool_size = 8

def do_ping(ip):
    if os.name == 'nt':
        print ("Using Windows Ping to " + ip)
        proc = Popen(['ping', ip], stdout=PIPE)
        return proc.communicate()[0]
    else:
        print ("Using Linux / Unix Ping to " + ip)
        proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE)
        return proc.communicate()[0]


os.system('cls' if os.name=='nt' else 'clear')
print ("Running using threads\n")
start_time = time.time()
pool = Pool(pool_size)
website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"]
result = {}
for website_name in website_names:
    result[website_name] = pool.apply_async(do_ping, args=(website_name,))
pool.close()
pool.join()
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Now we do the same without threading, just to compare time
print ("\nRunning NOT using threads\n")
start_time = time.time()
for website_name in website_names:
    do_ping(website_name)
print ("\n--- Execution took {} seconds ---".format((time.time() - start_time)))

# Here's one way to print the final output from the threads
output = {}
for key, value in result.items():
    output[key] = value.get()
print ("\nOutput aggregated in a Dictionary:")
print (output)
print ("\n")

print ("\nPretty printed output: ")
for key, value in output.items():
    print (key + "\n")
    print (value)
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Pitto
  • 6,355
  • 1
  • 29
  • 40
10

With borrowing from this post we know about choosing between the multithreading, multiprocessing, and async/asyncio and their usage.

Python 3 has a new built-in library in order to make concurrency and parallelism: concurrent.futures

So I'll demonstrate through an experiment to run four tasks (i.e. .sleep() method) by Threading-Pool:

from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep, time

def concurrent(max_worker):
    futures = []
    tic = time()
    with ThreadPoolExecutor(max_workers=max_worker) as executor:
        futures.append(executor.submit(sleep, 2))  # Two seconds sleep
        futures.append(executor.submit(sleep, 1))
        futures.append(executor.submit(sleep, 7))
        futures.append(executor.submit(sleep, 3))
        for future in as_completed(futures):
            if future.result() is not None:
                print(future.result())
    print(f'Total elapsed time by {max_worker} workers:', time()-tic)

concurrent(5)
concurrent(4)
concurrent(3)
concurrent(2)
concurrent(1)

Output:

Total elapsed time by 5 workers: 7.007831811904907
Total elapsed time by 4 workers: 7.007944107055664
Total elapsed time by 3 workers: 7.003149509429932
Total elapsed time by 2 workers: 8.004627466201782
Total elapsed time by 1 workers: 13.013478994369507

[NOTE]:

  • As you can see in the above results, the best case was 3 workers for those four tasks.
  • If you have a process task instead of I/O bound or blocking (multiprocessing instead of threading) you can change the ThreadPoolExecutor to ProcessPoolExecutor.
Benyamin Jafari
  • 15,536
  • 14
  • 81
  • 116
7

Here is multi threading with a simple example which will be helpful. You can run it and understand easily how multi threading is working in Python. I used a lock for preventing access to other threads until the previous threads finished their work. By the use of this line of code,

tLock = threading.BoundedSemaphore(value=4)

you can allow a number of processes at a time and keep hold to the rest of the threads which will run later or after finished previous processes.

import threading
import time

#tLock = threading.Lock()
tLock = threading.BoundedSemaphore(value=4)
def timer(name, delay, repeat):
    print  "\r\nTimer: ", name, " Started"
    tLock.acquire()
    print "\r\n", name, " has the acquired the lock"
    while repeat > 0:
        time.sleep(delay)
        print "\r\n", name, ": ", str(time.ctime(time.time()))
        repeat -= 1

    print "\r\n", name, " is releaseing the lock"
    tLock.release()
    print "\r\nTimer: ", name, " Completed"

def Main():
    t1 = threading.Thread(target=timer, args=("Timer1", 2, 5))
    t2 = threading.Thread(target=timer, args=("Timer2", 3, 5))
    t3 = threading.Thread(target=timer, args=("Timer3", 4, 5))
    t4 = threading.Thread(target=timer, args=("Timer4", 5, 5))
    t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5))

    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t5.start()

    print "\r\nMain Complete"

if __name__ == "__main__":
    Main()
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
cSharma
  • 596
  • 6
  • 17
5

None of the previous solutions actually used multiple cores on my GNU/Linux server (where I don't have administrator rights). They just ran on a single core.

I used the lower level os.fork interface to spawn multiple processes. This is the code that worked for me:

from os import fork

values = ['different', 'values', 'for', 'threads']

for i in range(len(values)):
    p = fork()
    if p == 0:
        my_function(values[i])
        break
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
David Schumann
  • 9,116
  • 6
  • 56
  • 78
1
import threading
import requests

def send():

  r = requests.get('https://www.stackoverlow.com')

thread = []
t = threading.Thread(target=send())
thread.append(t)
t.start()
Skiller Dz
  • 739
  • 7
  • 16
  • 1
    @sP_ I'm guessing because then you have thread objects so you can wait for them to finish. – Aleksandar Makragić Oct 16 '18 at 12:28
  • 2
    t = threading.Thread(target=send()) should be t = threading.Thread(target=send) – TRiNE Jan 23 '19 at 00:30
  • 1
    I'm downvoting this answer because it doesn't provide an explanation of how it improves upon existing answers, in addition to containing a serious inaccuracy. – Jules Jan 27 '19 at 04:22