2

I am not sure how to do multithreading and after reading a few stackoverflow answers, I came up with this. Note: Python 2.7

from multiprocessing.pool import ThreadPool as Pool
pool_size=10
pool=Pool(pool_size)

for region, directory_ids in direct_dict.iteritems():
    for dir in directory_ids:  
        try:
            async_result=pool.apply_async(describe_with_directory_workspaces, (region, dir, username))
            result=async_result.get()
            code=result[0]      
            content=result[1]
        except Exception as e:
            print "Some error happening"
            print e

        if code==0:
            if content:
                 new_content.append(content)
            else:
                 pass
        else:
            return error_html(environ, start_response, content)

What I am trying to do here is calling describe_with_directory_workspaces with different parameters of region and directories and run it in parallel so that I get the data quickly in new content. Currently, it is going in series which is what giving slow performance to end user.

Am I doing it right? Or is there some better way to do it? How can I confirm that I am getting the multithreading running as I expected it to?

MadMike
  • 1,006
  • 1
  • 8
  • 29
user3089927
  • 2,405
  • 7
  • 23
  • 31
  • 1
    There are multiple possible issues in your attempted solution (the code is serial, it seems you are creating a new thread pool in a web-request) . Start with the actual problem instead: is it a part of a wsgi app (judging by `error_html(..)`)? What does `describe_with_..()` function do (is it IO-bound)? Have you considered offloading the tasks into background threads/processes without waiting for results to finish the request? – jfs Sep 21 '15 at 23:51

3 Answers3

2

You don't want to call async_result.get until you've queued all of your jobs, otherwise you will only allow one job to run at a time.

Try queueing all of your jobs first, then processing each result after they've all been queued. Something like:

results = []
for region, directory_ids in direct_dict.iteritems():
    for dir in directory_ids:
        result = pool.apply_async(describe_with_directory_workspaces,
                                  (region, dir, username))
        results.append(result)

for result in results:
    code, content = result.get()
    if code == 0:
        # ...

If you want to handle the results in an asynchronous callback, you can supply a callback argument to pool.apply_async as documented here:

for region, directory_ids in direct_dict.iteritems():
    for dir in directory_ids:
        def done_callback(result):
            pass  # process result...

        pool.apply_async(describe_with_directory_workspaces,
                         (region, dir, username),
                         done_callback)
Myk Willis
  • 9,698
  • 2
  • 32
  • 52
  • is there any way to get results non-blocking rather than doing some more processing of errors? I mean this may do the job for me but looking for more options so that I can weigh. Thanks :) – user3089927 Sep 21 '15 at 23:16
  • Yes, if you prefer, you can supply a callback routine to be notified when each task is finished. – Myk Willis Sep 21 '15 at 23:37
  • any tradeoffs of using one over another? What I can understand is they both kind of do the same thing. – user3089927 Sep 22 '15 at 00:01
  • I think you would use the former (collect all of the results and process them at once) if you needed to do some processing that cared about how they all did in aggregate. e.g., if any one of them fails, does some higher-level operation fail? You would use the latter when each one is a completely independent thing. Personally, I avoid callbacks when there are clean alternatives (as here) because I find it easier to read. – Myk Willis Sep 22 '15 at 00:06
  • I tried callback method as well, the way you mentioned but does not seem to work. done_callback is not getting called. Am I missing some part in this? http://stackoverflow.com/questions/32814738/callback-not-getting-called/32815497#32815497 – user3089927 Sep 28 '15 at 15:05
  • The done_callback will be called at some later time, after the work item has been completed. This might not happen for quite some time, so if your main thread is just exiting after that loop, it's probably exiting before the jobs complete. – Myk Willis Sep 28 '15 at 22:32
1

You should look into Python's multiprocessing module.

From Python: Essential Reference by Beazley:

"Python threads are actually rather restricted. Although minimally thread-safe, the Python interpreter uses an internal Global Interpreter Lock that only allows a single Python thread to execute at any given moment. This restricts python programs to run on a single processor regardless of how many CPU cores might be available on the system."

So: If you have a lot of CPU processing going on, use multiprocessing.

Link to the documentation: https://docs.python.org/2/library/multiprocessing.html

multiprocessing Pools might be useful in your case.

EDIT: Missed that code was using multiprocessing already. See comments for what might be a more helpful answer. Also, for an example of how to use apply_async with callbacks, see: Python multiprocessing.Pool: when to use apply, apply_async or map? Note that Pool also has a map_async function.

See section 16.6.2.9 on the above link.

EDIT2: Example code to use get( ) outside of loop:

from multiprocessing import Pool

def sqr(x):
    return x*x

if __name__ == '__main__':
    po = Pool(processes = 10)
    resultslist = []
    for i in range(1, 10):
        arg = [i]
        result = po.apply_async(sqr, arg)
        resultslist.append(result)

    for res in resultslist:
        print(res.get())
Community
  • 1
  • 1
sgrg
  • 1,000
  • 6
  • 15
  • I am using multiprocessing pool but not sure if it is working as I intend to. – user3089927 Sep 21 '15 at 21:54
  • Whoops, long day. One thing that I see might be going wrong is that you're calling async_result.get() within the loop. However, get() will block if the result is not available immediately. And so I have a feeling this would end up running sequentially anyway. – sgrg Sep 21 '15 at 22:13
  • Also note that if you care about the order in which the results are returned (i.e. you want them returned in the order you make the calls) then you should probably use a map. – sgrg Sep 21 '15 at 22:16
  • thats the point, I dont want it sequential. Also,I have some if-else on "result" so thinking of any good workaround that you guys can suggest. – user3089927 Sep 21 '15 at 22:33
  • If you don't want it sequential, don't do the get in the loop. Do it outside the loop. Or use callbacks. – sgrg Sep 21 '15 at 22:40
  • can you provide an example of how to attain what I am trying to using get outside loop? Little too confused with how get can work outside the loop and then getting the result and using it – user3089927 Sep 21 '15 at 22:43
0

This code shows the demonstration of Multi-threading in Python. I created a simple messenger which sends and receives messages.

import threading

class Messenger(threading.Thread):
def run(self):
    for _ in range(10):
        print(threading.current_thread().getName())

x = Messenger(name='Send out messages')
y = Messenger(name='Receive Messages')
x.start()
y.start()
Atul Chavan
  • 1,424
  • 14
  • 12