How to write a multiprocessing web server in python

Question

I have a simple web server in python which responds to the requests based on some configurations. Configurations define the percent of OK, NOK, Timeout and Null responses:

import socket
import sys
import os
import datetime
import random
import time


# define globals
global log_file
global configs

dash = '-'
sep = '\n' + 100 * dash + '\n'
ok_message = 'HTTP/1.0 200 OK\n\n'
nok_message = 'HTTP/1.0 404 NotFound\n\n'


def initialize():
    if not os.path.isdir('./logs'):
        os.mkdir(os.path.abspath('./logs'))
    path = os.path.abspath(os.path.join(os.path.abspath('./logs'),
            datetime.datetime.now().strftime('%d-%m-%Y %H-%M-%S')))
    os.mkdir(path)
    log_file = open(os.path.join(path, 'received_packets.log'), 'a')


def finalize():
    log_file.close()


def select_resp_type():
    percents = {}
    for key, val in configs.items():
        if key.endswith('Percent'):
            percents.update({key: int(val)})
    items = [x.replace('Percent', '') for x, v in percents.items()
             if (float(counts[x.replace('Percent', '')]) / counts['all_packets']) * 100 < v]
    print items
    print [(float(counts[x.replace('Percent', '')]) / counts['all_packets']) * 100 for x, v in percents.items()]
    if len(items):
        selected = random.choice(items)
        counts[selected] += 1
        return selected
    sys.stdout('Everything is done!')
    sys.exit(0)


def get_response():
    resp_type = select_resp_type()
    if resp_type == 'ok':
        return ok_message
    elif resp_type == 'nok':
        return nok_message
    elif resp_type == 'nok':
        time.sleep(int(configs['timeoutAmount']))
        return ok_message
    elif resp_type == 'nok':
        time.sleep(int(configs['timeoutAmount']))
        return None


def load_configs(config):
    if not os.path.isfile(config):
        log_file.write('No such file ' + os.path.abspath(config))
        sys.exit(1)
    config_lines = open(config, 'r').readlines()
    configs = {}
    for line in config_lines:
        if line.strip() == '' or line.strip().startswith('#'):
            continue
        configs.update({line.split('=')[0].strip(): line.split('=')[1].strip()})


if __name__ == '__main__':
    initialize()
    config = sys.argv[3]
    load_configs(config)
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind((str(configs['host']), int(configs['port'])))
    s.listen(1)
    try:
        while True:
            s_sock, s_addr = s.accept()
            sfile = s_sock.makefile('rw', 0)
            content = sfile.readline().strip()
            while content != '':
                log_file.write(content + sep)
                resp = get_response()
                if resp:
                sfile.write(resp)
                sfile = s_sock.makefile('rw', 0)
                content = sfile.readline().strip()
            sfile.close()
            s_sock.close()
    except:
        print 'an exception occurred!'
        sys.exit(1)
    finally:
        finalize()

This is my configuration file:

# server configurations
host = 127.0.0.1
port = 8000
okPercent = 80
nokPercent = 20
nullPercent = 0
timeoutPercent = 0
timeoutAmount = 120
maxClients = 10

I want to change this script to be a multiprocessing (by which I mean non-blocking, so that multiple requests can be processed) web server, but I don't know where to start and how to do that. Any help?

EDIT 1:

According to @Jan-Philip Gehrcke's answer, I changed my script to use gevent library:

def answer(s):
    try:
        gevent.sleep(1)
        s_sock, s_addr = s.accept()
        print conn_sep + 'Receive a connection from ' + str(s_addr)
        while True:
            content = s_sock.recv(1024)
            counts['all_packets'] += 1
            log_file.write(packet_sep + content)
            resp = get_response()
            if resp:
                s_sock.send(resp)
    except:
         print 'An error occurred in connection with ', s_addr, '; quiting...'



if __name__ == '__main__':
    log_dir = sys.argv[2]
    log_file = initialize(sys.argv[2])
    config = sys.argv[1]
    configs = load_configs(config)
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind((str(configs['host']), int(configs['port'])))
    s.listen(int(configs['maxClients']))
    threads = [gevent.spawn(answer, s) for i in xrange(int(configs['maxClients']))]
    gevent.joinall(threads)

Nothing changed. Still if I run multiple clients to connect to the server, each one should wait for previous ones to be disconnected. Maybe I missed something. Any idea?

EDIT 2:

I also tried accepting requests in the main block as @Paul Rooney said:

def answer(server_sock):
    try:
        gevent.sleep(1)
        while True:
            content = server_sock.recv(1024)
            counts['all_packets'] += 1
            log_file.write(packet_sep + content)
            resp = get_response()
            if resp:
                server_sock.send(resp)
    except:
         print 'An error occurred in connection with ', s_addr, '; quiting...'



if __name__ == '__main__':
    log_dir = sys.argv[2]
    log_file = initialize(sys.argv[2])
    config = sys.argv[1]
    configs = load_configs(config)
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind((str(configs['host']), int(configs['port'])))
    s.listen(int(configs['maxClients']))
    s_sock, s_addr = s.accept()
    print conn_sep + 'Receive a connection from ' + str(s_addr)
    threads = [gevent.spawn(answer, s_sock) for i in xrange(int(configs['maxClients']))]
    gevent.joinall(threads)

First, I have the same result about concurrent connections; no requests will be answered till previous clients are dead. Second, when the first client disconnects, I get following error in the server and it terminates:

Traceback (most recent call last):
  File "/opt/python2.7/lib/python2.7/site-packages/gevent-1.0.1-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "main.py", line 149, in answer
    server_sock.send(resp)
error: [Errno 32] Broken pipe
<Greenlet at 0x1e202d0: answer(<socket._socketobject object at 0x1dedad0>)> failed with error

It seems when the first client disconnects, it closes its socket and that socket is no longer available for use; so other connected waiting clients can not be answered anymore.

What do you mean by multiprocessing, do you mean Non-Blocking? So that multiple requests can be processed? — user1767754, Jan 22 '15 at 12:19
Why are there 3 `elif resp_type == 'nok':` lines in `get_response()`? Also, `finalize()` won't get called as nothing is breaking that `while` loop. — PM 2Ring, Jan 22 '15 at 12:28
Have a look at [this question](http://stackoverflow.com/q/16952625/4014959) — PM 2Ring, Jan 22 '15 at 12:36
After accept returns spawn a process and pass it the client socket `s_sock` do your response and exit the child process. Sorry don't have the time to do a proper answer. — Paul Rooney, Jan 23 '15 at 03:59
@PM2Ring, I edited the code; so finalize will be surly done. About those 3 lines you've talked about, I mentioned that I have some configurations about how many percents of requests should be answered with `ok`, etc; and I should randomly select between those response types. I have added the configuration file, so you can see what I meant. — Zeinab Abbasimazar, Jan 26 '15 at 06:24
@Jan-PhilipGehrcke, I looked at `gevent`. It seems to be interesting; but I didn't really get my point in it for implementation. Could you please help me with some scripts? — Zeinab Abbasimazar, Jan 26 '15 at 06:27
I think the term is asynchroneous IO instead of multiprocessing. You might also want to have a look at the twisted library: https://twistedmatrix.com/trac/ — moooeeeep, Jan 26 '15 at 07:58
@ZeinabAbbasi I did an answer. There are several issues with your edited version, I dont think you have fully understood what is required. Hopefully the example in my answer will set you on the right track. — Paul Rooney, Jan 26 '15 at 11:47
@moooeeeep, thanks for introducing a new library. I've tried `Twisted` and it was quit interesting; but since I wanted a simple minimal solution from standard libraries, @Paul's answer was the best choice. — Zeinab Abbasimazar, Jan 27 '15 at 13:18

Paul Rooney · Accepted Answer · 2015-01-26T21:31:36.843

At the very simplest level what you can do is spawn a new process every time your accept call returns and pass the process the client socket, which is returned by accept.

You are effectively offloading the processing of the request to the child process and leaving the main process free to process new requests and likewise offload them to new child processes.

The way I have found to do this and I am not saying it the perfect answer but it works for me (Debian Python 2.7.3).

Simple example that bears some resemblance to your original code and is intended only to demonstrate when to spawn the process.

import socket
import sys
import time
import errno
from multiprocessing import Process

ok_message = 'HTTP/1.0 200 OK\n\n'
nok_message = 'HTTP/1.0 404 NotFound\n\n'

def process_start(s_sock):

    content = s_sock.recv(32)
    s_sock.send(ok_message)
    s_sock.close()
    #time.sleep(10)
    sys.exit(0) # kill the child process

if __name__ == '__main__':
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind((sys.argv[1], int(sys.argv[2])))
    print 'listen on address %s and port %d' % (sys.argv[1], int(sys.argv[2]))
    s.listen(1)
    try:
        while True:
            try:
                s_sock, s_addr = s.accept()
                p = Process(target=process_start, args=(s_sock,))
                p.start()

            except socket.error:
                # stop the client disconnect from killing us
                print 'got a socket error'

    except Exception as e:
        print 'an exception occurred!',
        print e
        sys.exit(1)
    finally:
        s.close()

The things to take note of are

s_sock, s_addr = s.accept()
p = Process(target=process_start, args=(s_sock,))
p.start()

Here is where you spawn a process in response to accept returning.

def process_start(s_sock):

    content = s_sock.recv(32)
    s_sock.send(ok_message)
    s_sock.close()
    #time.sleep(10)
    sys.exit(0) # kill the child process

Here is the function that starts the new process, takes the socket passed to it and sends the response (you would do a bit more here). and then kills the child. I'm not 100% sure that this is the correct way to kill the child process or that killing it is even required. Maybe someone can correct me or edit the answer if required.

I can see that even if I uncomment the time.sleep calls that I can get responses from multiple client sockets pretty much instantly.

The greenlets way is no doubt a better way to do it in terms of system resource and performance.

Thank you. That worked. It was really good. I tried it on lenny-64 with python 2.7 as you did and it was just fine! — Zeinab Abbasimazar, Jan 27 '15 at 10:38
Is there any way to merge this _multiprocessing_ library approach with an available webserver library such as _twisted_ , _tornado_, _dash_ etc? Otherwise how do we use their useful `http` handling capabilitiies? — StephenBoesch, Mar 17 '21 at 14:57
The above is a web server implemented from scratch if you are using a web server library, you needn't worry about this. tornado and twisted are single threaded anyway so you wouldnt spawn new processes or threads for new requests. — Paul Rooney, Mar 18 '21 at 00:25

Dr. Jan-Philip Gehrcke · Answer 2 · 2015-01-22T12:46:11.973

"I want to change this script to be a multiprocessing (by which I mean non-blocking, so that multiple requests can be processed)"

Indeed, you mean "non-blocking", that is the right term. Before doing anything, you need to appreciate that this is a complex topic and that you need to learn a bit about concurrency architectures.

"concurrency" is the concept of making multiple things happen at the same time (whereas often times we actually need efficient usage of a single CPU core instead of real simultaneity).

Believe me, this is not a trivial topic. One approach many would take here is to monkey-patch the socket module via gevent (search for that). This would allow for many network connections to be processed concurrently, without changing your code. Actually, your problem is a prime example for gevent. Have a look into it.

How this works? Gevent installs a greenlet-based machinery behind the scenes and monitors your open sockets for I/O events via libev. Each network connection is handled within its own execution context (a so-called coroutine, as implemented by greenlet). Behind the scenes, the execution flow then jumps between coroutines, depending on the order of I/O events on your sockets. That's actually a complicated topic and you cannot understand it within 5 minutes.

The core concept with gevent/greenlet/coroutines/even-driven architectures is:

Instantaneously detect when your program would wait for I/O
Do some other work instead

For this to realize one does not need multiple CPU cores, which is why "multiprocessing" is not a good term in your title.

Thank you for introducing a new library. But @Paul's solution was really straight froward, so I selected that as an answer. — Zeinab Abbasimazar, Jan 27 '15 at 10:39
That's okay :-) If it's for exercise, his answer is really fine and also a much simpler way to grasp the concept of having multiple independent entities that need to be synchronized in some way, using mechanisms provides by the operating system. — Dr. Jan-Philip Gehrcke, Jan 27 '15 at 12:27

How to write a multiprocessing web server in python

2 Answers2

Linked