using Python ladon on Apache, load once for a webservice

Question

as i am not really aware of the underlying strategies or protocols used by Ladon, Webservices and Apache (i am using Ladon and Python with mod_wsgi.so on a Windows Apache server - switched to Ubuntu system)

i wonder if this can be possible to load some ressources for python once, so that exposed methods use these ressources from python code without having to load these ressources again when considering /serving new queries to the web services?

do you have any clue on how to achieve this if possible, or any work around if not ?

typically i am loading some huge dictionaries from files that take too much time to load (I/O) and as it is loaded when receiving each new ladon query, the WS is too slow, i would have like to tell Ladon : "load this when apache start, and made that available to all my python web services/codes as a dictionary during all the time that Apache is running". I will not modify these datas, so i just need to able to read/access them.

best regards

first EDIT : if this could help, looks like on my Ubuntu (i have switched to Ubuntu from my Win config to be more "standard", hope i was right doing this), Apache2 is set in prefork mode rather than MPM, (as suggested by Jakob Simon-Gaarde) readed from :

@: sudo /usr/sbin/apache2 -l
Compiled in modules:
  core.c
  mod_log_config.c
  mod_logio.c
  prefork.c
  http_core.c
  mod_so.c
@: sudo /usr/sbin/apache2 -l | grep MPM
@:

i'm going to check how this can be done, maybe i am also putting some simplified code here, because for now i'm in a noway even with your helpful answers (i can make anything work here :/)

when installing MPM mode, found how to do here: $ sudo apt-get install apache2-mpm-worker

last EDIT:

here is the skeleton of my WS code :

MODEL_DIR = "/home/mydata.file"

import sys
import codecs
import glob
import os
import re

import numpy

from ladon.ladonizer import ladonize
from ladon.types.ladontype import LadonType
from ladon.compat import PORTABLE_STRING

class Singleton(type): 
    _instances = {} 
    def __call__(cls, *args, **kwargs): 
        if cls not in cls._instances: 
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs) 
        return cls._instances[cls] 

class LDtest(object):
    __metaclass__ = Singleton
    modeldir = MODEL_DIR
    def __init__(self):
        self.load()

    def load(self):
        modeldir = LDtest.modeldir
        self.data = mywrapperfordata.mywrapperfordata(modeldir)
        b = datetime.datetime.now()
        self.features = self.mywrapperfordata.load() # loading is wrapped here
        c = datetime.datetime.now()
        print("loading: %s done." % (c-b))

    def letsdoit(self, myinput):
        return [] # actually main logic ie complex stuff involving accessing to self.features

    @ladonize(PORTABLE_STRING, [ PORTABLE_STRING ], rtype = [ PORTABLE_STRING ] )
    def ws(self, myinput):
        result = self.letsdoit(myinput)
        return result

import datetime
a = datetime.datetime.now()
myLDtest = LDtest()
b = datetime.datetime.now()
print("LDtest: %s" % (b-a))

about loading time: from my apache2 log: -notice that module 1 is required and imported by module 2 and also providing as a lonely webservice. It looks like the singleton is not built or not quickly enough?

[Tue Jul 09 11:09:11 2013] [notice] caught SIGTERM, shutting down
[Tue Jul 09 11:09:12 2013] [notice] Apache/2.2.16 (Debian) mod_wsgi/3.3 Python/2.6.6 configured -- resuming normal operations
[Tue Jul 09 11:09:50 2013] [error] Module 4: 0:00:02.885693.
[Tue Jul 09 11:09:51 2013] [error] Module 0: 0:00:03.061020
[Tue Jul 09 11:09:51 2013] [error] Module 1: 0:00:00.026059.
[Tue Jul 09 11:09:51 2013] [error] Module 1: 0:00:00.012517.
[Tue Jul 09 11:09:51 2013] [error] Module 2: 0:00:00.012678.
[Tue Jul 09 11:09:51 2013] [error] Module (dbload): 0:00:00.402387 (22030)
[Tue Jul 09 11:09:54 2013] [error] Module 3: 0:00:00.000036.
[Tue Jul 09 11:13:00 2013] [error] Module 0: 0:00:03.055841
[Tue Jul 09 11:13:01 2013] [error] Module 1: 0:00:00.026215.
[Tue Jul 09 11:13:01 2013] [error] Module 1: 0:00:00.012600.
[Tue Jul 09 11:13:01 2013] [error] Module 2: 0:00:00.012643.
[Tue Jul 09 11:13:01 2013] [error] Module (dbload): 0:00:00.322444 (22030)
[Tue Jul 09 11:13:03 2013] [error] Module 3: 0:00:00.000035.

Simon · Answer 1 · 2012-09-06T06:56:40.137

2

mod_wsgi launches one or more Python processes upon startup and leaves them running to handle requests. If you load a module or set a global variable, they'll still be there when you handle the next request - however, each Python process has its own separate block of memory, so if you configure mod_wsgi to launch 8 processes and load a 1G dataset, eventually you'll be using 8G of memory. Maybe you should consider using a database?

edit: Thanks Graham :-) So with only one process and multiple threads, you can share one copy of your huge dictionary between all worker threads.

edited Sep 06 '12 at 06:56

answered Sep 05 '12 at 16:48

Simon

10,182
4
30
38

With the exception that on Windows there is only one process. Windows doesn't support multiprocess configurations. See http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading – Graham Dumpleton Sep 06 '12 at 04:57
so declaring my dictionaries as global variables are the only thing i had to do (if my understanding is right)? – user1340802 Sep 06 '12 at 08:25
I'd declare them in a separate module for cleanliness, but yes, just make them module-level variables. – Simon Sep 06 '12 at 08:31
ok, so i must miss something, i have a class that load in its constructor my data. This instance is defined as global and inside global scope (no indent). And After reading your answer, i decide to make this class a singleton using the metaclass defined here : http://stackoverflow.com/questions/6760685/creating-a-singleton-in-python BTW, i can see that in my apache log, my data are loaded many times and even more than one time by request attempt (looks like asking for description also fired up loading). Do you have a clue of what i may have missed? – user1340802 Sep 07 '12 at 07:43
Can you post the code of your request handler and of where you store your data? – Simon Sep 07 '12 at 08:46
Sorry for the delay, I didn't see the edit - You could still have a cache stampede if the threads are created faster than your code can load the data (i.e., several threads try to load the data because the first thread has not yet managed to complete creating the singleton). Do you see the multiple loading events in your log in a short time? How long does loading roughly take? – Simon Oct 08 '12 at 08:28
please can you precise what you mean with cache stampede? about the timing i edit to show it. And accept my apologize for being so long to answer. Do not see it. I now have a trouble with some script hanging the whole process. Maybe is it related to the mutex stuff from your original answer, please can you provide some clues about how i should process? Best regards – user1340802 Jul 09 '13 at 09:20
(also i am on linux box, not window) – user1340802 Jul 12 '13 at 09:17
A cache stampede happens when several of your threads first check whether the data has been loaded into the global variable yet, find it has not, and start to load it. Since loading the data takes some time, and the global variable is only set after the loading has been completed, multiple threads will start to load the data concurrently, each one overwriting the global variable with the same data when finished. There's also a wikipedia article on cache stampede. – Simon Jul 12 '13 at 11:13

score 1 · Accepted Answer · answered Sep 06 '12 at 21:01

We use Ladon extensively at my work with all our web projects, and I have the priviledge of being able to develop my private project (I am the Ladon developer) and getting payed for it ;-) Some of our services have very heavy resource consumptions, for instance we have a text-to-speach service that loads around 1Gb of data into memory per supported language, and a wordprediction service that loads around 100Mb per supported language.

mod_wsgi is fine - we use that aswell - What you need to do is make sure that your apache server is compiled as mpm-worker (http://httpd.apache.org/docs/2.2/mod/worker.html). In this configuration your service runs in a multi-threaded environment instead of a multi-process environment. The effect is that you only fire up one interpreter per server process which then runs your service in several underlying threads that share resources. The caveeat is that you have to make sure that your service does not step on it's own toes, meaning you will have to protect global variables and class-static variables shared between service class instances with mutex.acquire()/mutex.release().

Other than that Ladon as a framework is build for multi-threaded environments.

Best regards Jakob Simon-Gaarde

I don't know a lot about how apache runs on Windows, but I can see now that it isn't called mpm-worker on that platform. As far as I can see Windows always runs a multi-threaded enviroment. So you should be fine :-) — Jakob Simon-Gaarde, Sep 06 '12 at 21:07
so i switched to Ubuntu and install it as MPM, by the way, how my i check that it is ok (not the install but for my python code)? i have added the skeleton of the code i use with a Singleton pattern so that it should load only once, looking in apache2 error log i can see many messages "loading: x done." while i am still not sure that it actually loads each time it is called. — user1340802, Sep 07 '12 at 13:25
Please, i am facing another problem with Ladon, is there any way or tricks, or may be plan to use Ladon with JSONP, i.e. calling the JSON provided by Ladon from a cross script/domain perspective? Best regards. — user1340802, Oct 08 '12 at 08:30

using Python ladon on Apache, load once for a webservice

2 Answers2