54

How can I use functools' lru_cache inside classes without leaking memory? In the following minimal example the foo instance won't be released although going out of scope and having no referrer (other than the lru_cache).

from functools import lru_cache
class BigClass:
    pass
class Foo:
    def __init__(self):
        self.big = BigClass()
    @lru_cache(maxsize=16)
    def cached_method(self, x):
        return x + 5

def fun():
    foo = Foo()
    print(foo.cached_method(10))
    print(foo.cached_method(10)) # use cache
    return 'something'

fun()

But foo and hence foo.big (a BigClass) are still alive

import gc; gc.collect()  # collect garbage
len([obj for obj in gc.get_objects() if isinstance(obj, Foo)]) # is 1

That means that Foo/BigClass instances are still residing in memory. Even deleting Foo (del Foo) will not release them.

Why is lru_cache holding on to the instance at all? Doesn't the cache use some hash and not the actual object?

What is the recommended way use lru_caches inside classes?

I know of two workarounds: Use per instance caches or make the cache ignore object (which might lead to wrong results, though)

Community
  • 1
  • 1
televator
  • 713
  • 1
  • 6
  • 7

3 Answers3

37

This is not the cleanest solution, but it's entirely transparent to the programmer:

import functools
import weakref

def memoized_method(*lru_args, **lru_kwargs):
    def decorator(func):
        @functools.wraps(func)
        def wrapped_func(self, *args, **kwargs):
            # We're storing the wrapped method inside the instance. If we had
            # a strong reference to self the instance would never die.
            self_weak = weakref.ref(self)
            @functools.wraps(func)
            @functools.lru_cache(*lru_args, **lru_kwargs)
            def cached_method(*args, **kwargs):
                return func(self_weak(), *args, **kwargs)
            setattr(self, func.__name__, cached_method)
            return cached_method(*args, **kwargs)
        return wrapped_func
    return decorator

It takes the exact same parameters as lru_cache, and works exactly the same. However it never passes self to lru_cache and instead uses a per-instance lru_cache.

orlp
  • 98,226
  • 29
  • 187
  • 285
  • 2
    This has the slight strangeness to it that the function on the instance is only replaced by the caching wrapper on the first invocation. Also, the caching wrapper function is not anointed with `lru_cache`'s `cache_clear`/`cache_info` functions (implementing which was where I bumped into this in the first place). – AKX Nov 13 '18 at 15:34
  • This doesn't seem to work for `__getitem__`. Any ideas why ? It does work if you call `instance.__getitem__(key)` but not `instance[key]`. – JoseKilo Aug 07 '19 at 14:16
  • This will not work for any special method because those are looked up on the class slots and not in instance dictionaries. Same reason why setting `obj.__getitem__ = lambda item: item` will not cause `obj[key]` to work. – pankaj Nov 06 '20 at 16:57
16

I will introduce methodtools for this use case.

pip install methodtools to install https://pypi.org/project/methodtools/

Then your code will work just by replacing functools to methodtools.

from methodtools import lru_cache
class Foo:
    @lru_cache(maxsize=16)
    def cached_method(self, x):
        return x + 5

Of course the gc test also returns 0 too.

youknowone
  • 719
  • 5
  • 13
  • 1
    You can use either one. `methodtools.lru_cache` behaves exact like `functools.lru_cache` by reusing `functools.lru_cache` inside while `ring.lru` suggests more features by reimplementing lru storage in python. – youknowone Jun 05 '19 at 07:47
  • 4
    `methodtools.lru_cache` on a method uses a separate storage for each instance of the class, while the storage of `ring.lru` is shared by all the instances of the class. – Filip Bártek Aug 14 '19 at 14:46
2

python 3.8 introduced the cached_property decorator in the functools module. when tested its seems to not retain the instances.

If you don't want to update to python 3.8 you can use the source code. All you need is to import RLock and create the _NOT_FOUND object. meaning:

from threading import RLock

_NOT_FOUND = object()

class cached_property:
    # https://github.com/python/cpython/blob/v3.8.0/Lib/functools.py#L930
    ...
moshevi
  • 2,860
  • 2
  • 17
  • 35