Push data to a distant client from a python thread running on a server

Question

I am trying to implement this simple workflow:

a python script runs a daemon thread. This thread consume a multi-threaded queue, and make some heavy computations
when some data is ready, the thread should be able to tell it to another app connected to the Internet (a js app in the browser in my specific case).

Right now, this sounds simple. But I am totally stuck on the second part. I can't manage to have my thread to push the data to the client when they are ready.

I tried:

ZeroRPC streams. I managed to stream data but... I can't pass the computed data to this stream.
Websockets. Setting up ws with Python seems to require high level skills... I managed to set up the websocket server. It runs in a thread, and listen for client messages. But I don't manage to pass this server to my computation thread, so that it can send data through it.

Here is some sample code:

    def init_socket_server(self, host='localhost', port='8765'):

    # the socket handler
    async def hello(websocket, path):
        # that websocket object is what I need, but it lives in 
        # its own thread, I can't figure how to get it in other threads
        name = await websocket.recv()
        print("< {}".format(name))
        greeting = "Hello {}!".format(name)
        await websocket.send(greeting)
        print("> {}".format(greeting))

    # this object does not seem to be my actual server, just some init function?
    start_server = websockets.serve(hello, host, port)
    self.socket_server = start_server

    # creating the thread
    loop = asyncio.get_event_loop()
    def start():
        loop.run_until_complete(start_server)
        loop.run_forever()
    self.socket_thread = Thread(target=start, name="socket_server")
    self.socket_thread.start()

It's tricky because I need the server to run in its own thread, otherwise it would block my whole code (I must run multiple threads for other purposes so they all have to be in the background).

However, it seems that my self.socket_server = start_server is wrong. The start_server does not seem to be the server itself but some Serve object.

The queue that must send the data and that runs in another thread:

 # this lives in another thread
 def _consume_view_queue(self):
    while True:
        if not VIEW_REQUEST_QUEUE.empty():
            request = VIEW_REQUEST_QUEUE.get()
            if request:
                data = self.doSmth(request)
                # does not work as I expect
                # I do not seem to have the right object
                self.socket_server.send('DATA READY ' + view_id)
                VIEW_REQUEST_QUEUE.task_done()

It fails, because my self.socket_server sounds wrong. Here is the error:

AttributeError: 'Serve' object has no attribute 'send'

Sorry if it is a dupe or a noob question, but I found no helpful ressource (and websockets/asyncio docs are not really helpful here...).

Edit: I managed to share the websocketServer object using a global. It does not work though, I still have a AttributeError: 'WebSocketServer' object has no attribute 'send'. I don't understand why my ws server has no send method.

Edit 2: so you got more context, I am building an Electron app. An electron app is a chrome client, similar to any web app, and a local node server. This app also launches a python process for computations.

So far Python and JS code communicated with RPC calls, in a "synchronous" way: client request for data, server receive the request and treat it immediatiely, then answer the request. It is bad when computations go heavy, you can't sort them by priority. So I implemented a waiting queue: computation requests are stacked in this queue. This is more robust, scales well and allow sorting of computations by priority.

But now computations lives in a separated thread, which runs an infinite loop that consumes the queue. To my best knowledge this is the usual pattern to consume a queue. But I never managed to pass those data back to the client. I'd like this computation thread to be able to send events to my JS client, I tried ZeroRPC streams, I tried websockets without success. I could store the data in another queue so that I can access them from the thread where my websocket server lives. But then, it seems that you can't listen to a queue change to trigger an event, so you need an infinite loop to consume it, and therefore another thread... I can't figure it out :)

I may share a global accross files/threads but does not sound either safe or clean https://stackoverflow.com/questions/13034496/using-global-variables-between-files — Eric Burel, Apr 06 '18 at 16:25
Other alternative is polling from the client till I get all the data I wanted, but I don't like it much. I am in a big data context, overloading servers with unuseful request is not my favourite choice. — Eric Burel, Apr 06 '18 at 16:27
I would try to keep parts as separate as possible. Your sources -> input queue -> your heavy-computing thread -> _output queue_ -> the websocket server. I'd concentrate on making the websocket server (or a long-poll server) work with your client sending dumb test data. Once it works, I'd connect the queue with real data. — 9000, Apr 06 '18 at 16:37
Hi, thanks for your answer. I got it working with dumb data. The issue is that I don't manage to implement the last step : to connect the websocket server to the computation result, either in a queue or not. If I store them in a result queue, I'll still have to consume this queue in an infinite loop in a separate thread, but then I don't manage to send a message to the client from this thread. — Eric Burel, Apr 08 '18 at 14:02

score 2 · Answer 1 · answered Apr 06 '18 at 16:24

2

Generally, the path of least resistance for doing this sort of "wake JS in browser on event" is to have the JS post an XMLHttpRequest to your server, and have your server hold on to that pending request (usually a GET, but any HTTP op could be used based on your REST design). When you have data, you then complete that request with the data. The client can process/display it, then post a new request.

Have a look at: http://en.wikipedia.org/wiki/Long_polling

You usually will want a "generation" counter in the request, usually a URL argument like "http://myserver.com/nextdata.json?gen=1234" so your server knows when to send data immediately (for instance, first request from client with generation 0) or hold on until there's new data.

As implied here, typically you encode the data as JSON as it's a very easy way to move data in the REST/WWW world, and Python has an excellent JSON parsing module.

answered Apr 06 '18 at 16:24

Andy Valencia

146
5

Hi, thanks for the anwser. What lib would you use to implement this? According to the Wikipedia article it sounds like a palliative to sockets, I am not sure this will help me to build a bridge between my computation thread and my js client. I'll edit my question with more context. – Eric Burel Apr 08 '18 at 14:06
I'm running from your high level description, which makes it sound like you're OK with the Python process to be on its own (remote) server. So I would have that process offer a RESTful interface using just BaseHTTPServer or any wrapping middleware which you like. Then you can have the client POST work to this server (being Python, multiple threads is easy) and then GET with long polling to hear back when a given computation has results. Being a single RESTful server you're free to POST and GET from the same JS context, different threads, even different machines. – Andy Valencia Apr 09 '18 at 14:42
Is this pattern safe when you have multiple response for one request? E.g I ask for an histogram, but then the server streams the answer with one response per bar in the histogram? – Eric Burel Apr 10 '18 at 08:38
In REST you restructure that to have one response for one request, i.e., GET /histo.json to get an initial set, then once you've rendered that in the browser GET /histo.json?since=, with a timestamp of the last datum in your first data set. This GET completes immediately if there's data, otherwise gets held until new data is available (i.e., long polling). This gets you a continuously updating histogram as data becomes available from the server. – Andy Valencia Apr 11 '18 at 14:20
Thanks for the insights. I went for a simpler workflow but I get what you mean, that could be indeed a good solution to gain performance when the app will evolve. – Eric Burel Apr 11 '18 at 18:46

score 0 · Accepted Answer · answered Apr 11 '18 at 18:45

I eventually found a working workflow. I was not far to a working answer, my mistake was to try to move the socket server from its thread, and my inability to write the correct coroutine.

Instead, I used a second queue to store computations, and I wrote the correct coroutine, too:

    async def send_data_to_client(websocket, path):
        while True:
            data = OUTPUT_QUEUE.get()
            await websocket.send(data)
            OUTPUT_QUEUE.task_done()
    start_server = websockets.serve(send_data_to_client, host, port)
    loop = asyncio.get_event_loop()
    def start_socket_server():
        loop.run_until_complete(start_server)
        loop.run_forever()
    self.socket_thread = Thread(target=start_socket_server, daemon=True, name="socket_server")
    self.socket_thread.start()

It follows the workflow summarized by 9000 in his comment. With asyncio, we can even combine this producer loop to a message consumption loop, see the websockets doc.

Push data to a distant client from a python thread running on a server

2 Answers2