6

Imagine there is a framework which provides a method called logutils.set_up() which sets up the logging according to some config.

Setting up the logging should be done as early as possible since warnings emitted during importing libraries should not be lost.

Since the old way (if __name__=='__main__':) looks ugly, we use console_script entrypoints to register the main() method.

# foo/daily_report.py
from framework import logutils
logutils.set_up()
def main():
    ...

My problem is that logutils.set_up() might be called twice:

Imagine there is a second console script which calls logutils.set_up() and imports daily_report.py.

I can change the framework code and set_up() to do nothing in the second call to logutils.set_up(), but this feels clumsy. I would like to avoid it.

How can I be sure that logutils.set_up() gets only executed once?

guettli
  • 26,461
  • 53
  • 224
  • 476
  • Wow, no satisfying solution although a bounty of 250 was given. Two answers explain how to go the clumsy way. I want the setup of the logging called one for some reasons: less surprise ("There should be one-- and preferably only one --obvious way to do it"), feels cleaner, less wasted CPU cycles, .... – guettli Nov 26 '15 at 07:03
  • I created a related question: http://stackoverflow.com/questions/33932553/set-up-logging-early-catch-warnings-emmited-during-importing – guettli Nov 26 '15 at 07:13

7 Answers7

2

There are a few ways to achieve the goal, each with its advantages and disadvantages.

(some of these overlap with the other answers. I don't mean to plagiarize, only to provide a comprehensive answer).


Approach 1: The function should do it

One way to guarantee a function only gets executed once, is to make the function itself stateful, making it "remember" it has already been called. This is more or less what is described by @eestrada and @qarma.

As to implementing this, I agree with @qarma that using memoization is the simplest and most ideomatic way. There are a few simple memoization decorators for python on the internet. The one included in the standard library is functools.lru_cache. You can simply use it like:

@functools.lru_cache
def set_up():  # this is your original set_up() function, now decorated
    <...same as before...>

The disadvantage here is that it is arguably not the set_up's responsibility to maintain the state, it is merely a function. One can argue it should execute twice if being called twice, and it's caller's responsibility to only call it when it needs it (what if you really do want to run it twice)? The general argument is that a function (in order to be useful and reusable) should not make assumptions about the context in which it is called.

Is this argument valid in your case? It is up to you to decide.

Another disadvantage here is that this can be cosidered an abuse of the memoization tool. Memoization is a tool closely related to functional programming, and should be applied to pure functions. Memoizing a funciton implies "no need to run it again, because we already know the result", and not "no need to run it again, because there's some side effect we want to avoid".

Approach 2: the one you think is ugly (if __name__=='__main__')

The most common pythonic way, which you already mention in your question, is using the infamous if __name__=='__main__' construct.

This guarantees the function is only called once, because it is only called from the module named __main__, and the interpreter guarantees there is only one such module in your process.

This works. There are no complications nor caveats. This is the way running main-code (including setup code) is done in python. It is considered pythonic simply because it is so darn common in python (since there are no better ways).

The only disadvantage is that it is arguably ugly (asthetics-wise, not code-quality-wise). I admit I also winced the first few times I saw it or wrote it, but it grows on you.

Approach 3: leverage python's module-importing mechanism

Python already has a caching mechanism preventing modules from being doubly-imported. You can leverage this mechanism by running the setup code in a new module, then import it. This is similar to @rll's answer. This is simple, to do:

# logging_setup.py
from framework import logutils
logutils.set_up()

Now, each caller can run this by importing the new module:

# foo/daily_report.py
import logging_setup # side effect!
def main():
    ...

Since a module is only imported once, set_up is only called once.

The disadvantage here is that it violates the "explicit is better than implicit" principle. I.e. if you want to call a function, call it. It isn't good practice to run code with side-effects on module-import time.

Approach 4: monkey patching

This is by far the worst of the approaches in this answer. Don't use it. But it is still a way to get the job done.

The idea is that if you don't want the function to get called after the first call, monkey-patch it (read: vandalize it) after the first call.

from framework import logutils
logutils.set_up_only_once()

Where set_up_only_once can be implemented like:

def set_up_only_once():
    # run the actual setup (or nothing if already vandalized):
    set_up()
    # vandalize it so it never walks again:
    import sys
    sys.modules['logutils'].set_up = lambda: None

Disadvantages: your colleagues will hate you.


tl;dr:

The simplest way is to memoize using functools.lru_cache, but it might not be the best solution code-quality-wise. It is up to you if this solution is good enough in your case.

The safest and most pythonic way, while not pleasing to the eye, is using if __name__=='__main__': ....

Community
  • 1
  • 1
shx2
  • 53,942
  • 9
  • 107
  • 135
1

I have done something similar in my phd project. I do the initialization in the __init__.py of themodule with basic config (see here):

logging.getLogger('modulename').addHandler(logging.NullHandler())
FORMAT = '%(name)s:%(levelname)s:  %(message)s'
logging.basicConfig(format=FORMAT)

And then later, for instance if a config file is provided, you overwrite the config. As an example (you can find this in the constructor of EvoDevoWorkbench):

logging.config.fileConfig(config.get('default','logconf'),
                          disable_existing_loggers=False)

where config.get('default','logconf') is the logging configuration file path. Then in any sub-module you use the regular:

log = logging.getLogger(__name__)

In your specific case, if you setup the logging (or call the set_up) inside framework's __init__.py then it will never be called twice. If you cannot do this, the only way I see is to either use the guard if __name__=='__main__': or make foo or daily_report a module so that you can put the set_up call on the __init__.py file. Then you can use it as described above.

You can see the documentation for more details.

rll
  • 4,932
  • 3
  • 26
  • 45
0

You can use singletons. A Singleton class gets created only once, and subsequent calls will point to the same object instead of creating a new one. This answer explains different ways of creating a singleton class (all simple).

I personally prefer the base class approach. You first define Singleton class as below:

class Singleton(type):
    _instances = {}
    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
        return cls._instances[cls]

then use it as a meta class like this:

class MyClass(object):
    __metaclass__ = Singleton

    "the rest of you class as normal"

Now the first time you call MyClass() it will create the object for you. Subsequent calls will refer to the same object (sort of like a global variable of classes!)

Community
  • 1
  • 1
kakhkAtion
  • 2,034
  • 18
  • 21
0

It isn't clumsy to change the set_up code; indeed, it is the only way to know for certain that the initialization is only done once. Here is some black magic for you, if you would rather not use an if statement to do the job:

# framework/logutils.py
def _set_up_internal():
    global set_up
    # NOTE: start setup
    return_val = None  # if there is a useful return value
    # NOTE: finish setup

    # clobber global reference with a dummy implementation
    set_up = lamdba: return_val  # or return `None` if there is no useful return value

set_up = _set_up_internal

No explicit check is required and it is assured to only ever call the function once. This isn't thread safe, but I am assuming that that isn't a requirement (since it wasn't mentioned in the question).

eestrada
  • 1,375
  • 12
  • 21
  • This looks equal to the Singleton solution posted by kakhkAtion. Is the result of your solution different? – guettli Nov 26 '15 at 06:27
  • @guettli The result isn't any different. Only the implementation. One benefit of my solution is that it works in both Python 2 and Python 3. Applying metaclasses changed between the two versions in incompatible ways, so kakhkAtion's answer can only work in one version, unless you add a fair amount of compatibility code. – eestrada Nov 26 '15 at 07:04
0

For the sake of completeness, I will add a solution --there are plenty of valid options here, but maybe this one fills a gap.

It is a little verbose, but I feel it is relatively simple and clean:

# foo/daily_report.py
from framework import logutils

if not hasattr(logutils.set_up, "_initiated"):
    logutils.set_up()
    logutils.set_up._initiated = True

def main():
    pass

This way you are not actively changing the function itself... or not so much: you are adding an attribute which you check. Instead of attaching that attribute to the function you could put it somewhere else, or wrap the initialization entirely to a singleton class. But those solutions have already been proposed, if I'm not mistaken.

Three lines that should be put in every call to set_up.

The problem would be if some code calls to set_up without setting that attribute (because a bug, because external dependency, whatever). But, if that is the case, then you have no other options but to change the code or to check the behaviour of the function itself.

Note: I assume that the framework's function set_up is Pure Python. I assume that this won't work for C extension functions or built-ins, but I have not checked those.

MariusSiuram
  • 2,618
  • 16
  • 39
-1

FWIW, I think that having the logutils package defend itself against multiple calls to its setup is not clumsy, it's dealing with the problem in the right place. After you've done that, your logutils package is now more robust.

Any other solution, "outside" of logutils, is susceptible to bugs due to being overlooked in some case.

GreenAsJade
  • 14,002
  • 9
  • 54
  • 91
  • 1
    This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/10316847) – piet.t Nov 24 '15 at 10:43
  • I disagree. An answer that corrects a mistaken premise in the question is in fact an answer. The answer is "do it in the framework". He already said that he knows _how_ to do this, the only missing piece is the information that I supplied: he needs to do it this way for maintainability and robustness. – GreenAsJade Nov 24 '15 at 10:59
-1

Just slap @memoize on your set_up and then it is only called once :)

Dima Tisnek
  • 9,367
  • 4
  • 48
  • 106
  • Where does `@memoize` come from? – guettli Nov 26 '15 at 14:08
  • There are a lot of examples out there. Your case is particularly easy because your `set_up()` takes no arguments. py3 has `functools.lru_cache`, a generic mechanism that you can use here; py2 e.g. https://wiki.python.org/moin/PythonDecoratorLibrary#Alternate_memoize_as_dict_subclass and a shorter implementation is possible too. – Dima Tisnek Nov 27 '15 at 10:03