-1

Is there a way we can define the following code (a classic example for recursion) via generators in Python? I am using Python 3.

def fac(n):
    if n==0:
        return 1
    else:
        return n * fac(n-1)

I tried this, no success:

In [1]: def fib(n):
   ...:     if n == 0:
   ...:         yield 1
   ...:     else:
   ...:         n * yield (n-1)
  File "<ipython-input-1-bb0068f2d061>", line 5
    n * yield (n-1)
            ^
SyntaxError: invalid syntax

Classic recursion in Python leads to Stack Overflow

This classic example leads to a stack overflow on my machine for an input of n=3000. In the Lisp dialect "Scheme" I'd use tail recursion and avoid stack overflow. Not possible in Python. That's why generators come in handy in Python. But I wonder:

Why no stack overflow with generators?

Why is there no stack overflow with generators in Python? How do they work internally? Doing some research leads me always to examples showing how generators are used in Python, but not much about the inner workings.

Update 1: yield from my_function(...)

As I tried to explain in the comments secion, maybe my example above was a poor choice for making a point. My actual question was targeted at the inner workings of generators used recursively in yield from statements in Python 3.

Below is an (incomplete) example code that I use to proces JSON files generatred by Firebox bookmark backups. At several points I use yield from process_json(...) to recursively call the function again via generators.

Exactly in this example, how is stack overflow avoided? Or is it?


# (snip)

FOLDERS_AND_BOOKMARKS = {}
FOLDERS_DATES = {}

def process_json(json_input, folder_path=""):
    global FOLDERS_AND_BOOKMARKS
    # Process the json with a generator
    # (to avoid recursion use generators)
    # https://stackoverflow.com/a/39016088/5115219

    # Is node a dict?
    if isinstance(json_input, dict):
        # we have a dict
        guid = json_input['guid']
        title = json_input['title']
        idx = json_input['index']
        date_added = to_datetime_applescript(json_input['dateAdded'])
        last_modified = to_datetime_applescript(json_input['lastModified'])

        # do we have a container or a bookmark?
        #
        # is there a "uri" in the dict?
        #    if not, we have a container
        if "uri" in json_input.keys():
            uri = json_input['uri']
            # return URL with folder or container (= prev_title)
            # bookmark = [guid, title, idx, uri, date_added, last_modified]
            bookmark = {'title': title,
                        'uri':   uri,
                        'date_added': date_added,
                        'last_modified': last_modified}
            FOLDERS_AND_BOOKMARKS[folder_path].append(bookmark)
            yield bookmark

        elif "children" in json_input.keys():
            # So we have a container (aka folder).
            #
            # Create a new folder
            if title != "": # we are not at the root
                folder_path = f"{folder_path}/{title}"
                if folder_path in FOLDERS_AND_BOOKMARKS:
                    pass
                else:
                    FOLDERS_AND_BOOKMARKS[folder_path] = []
                    FOLDERS_DATES[folder_path] = {'date_added': date_added, 'last_modified': last_modified}

            # run process_json on list of children
            # json_input['children'] : list of dicts
            yield from process_json(json_input['children'], folder_path)

    # Or is node a list of dicts?
    elif isinstance(json_input, list):
        # Process children of container.
        dict_list = json_input
        for d in dict_list:
            yield from process_json(d, folder_path)

Update 2: yield vs yield from

Ok, I get it. Thanks to all the comments.

  • So generators via yield create iterators. That has nothing to do with recursion, so no stack overflow here.
  • But generators via yield from my_function(...) are indeed recursive calls of my function, albeit delayed, and only evaluated if demanded.

This second example can indeed cause a stack overflow.

Ugur
  • 1,491
  • 1
  • 15
  • 39
  • 2
    Who says there’s no stack overflow with generators? You need a working example to ask that question about. Also, you can use generators to bypass the stack and create your own, but that still doesn’t use constant memory. To do that, you would need to make it a tail call, which no longer requires a generator to implement – it can just return the parameter for the next call directly. – Ry- Jun 17 '20 at 10:09
  • 1
    This QA explain generators well, I think: https://stackoverflow.com/questions/1756096/understanding-generators-in-python They don't really lend itself for recursive functions though. Think of them as iterable objects. Basically closures with some internal state and each time you call next() you get the next value. But that is computed on demand, so uses limited memory. – Benjamin Maurer Jun 17 '20 at 10:09
  • @Ry- But classic recursion can always lead to stack overflow in Python. For example if I process an XML file with deeply nested structure. This does not happen with generators if I use "yield from". - At least I haven't seen any example that leads to a stack overflow. And my question is: Why does this not throw a stack overflow? Obviously the stack is avoided, but where are the intermediate states kept then? – Ugur Jun 17 '20 at 10:15
  • 2
    @Ugur: Again, would need a specific example of generators fixing that problem. Here’s an example of a stack overflow with a tail `yield from`, let alone an arbitrary one: `all(gen())` with `def gen(): yield 1; yield from gen()` – Ry- Jun 17 '20 at 10:16
  • 1
    @Ugur - what would you like the generator to produce? Generators usually produce a sequence of values. fac(n) is a single number. – Roy2012 Jun 17 '20 at 10:16
  • @BenjaminMaurer Thanks for the tip. I know that example. It does not mention recursion at all. And my question is specifically with regards to recursion. – Ugur Jun 17 '20 at 10:17
  • @Roy2012 There are examples demonstrating the use of generators for generating fibonacci numbers. Although a sequence of numbers, it is a classic example for demonstrating how to use generators instead of recursion. So I was wondering if generators could be used instead of any recursive example. – Ugur Jun 17 '20 at 10:19
  • @Ugur fibonacci is indeed a sequence of numbers, and hence it's a natural thing for generators. If you're looking for a generator that would produce the sequence of factorials (1, 2, 6, 24, etc) that makes sense as well. Perhaps I didn't understand the question correctly - by fac(2) do you mean the sequence (1, 2, 6, ... n!) ? – Roy2012 Jun 17 '20 at 10:21
  • @Roy2012 Yes, that's what I mean. – Ugur Jun 17 '20 at 10:23
  • @Ugur - are you looking for any generator that implements fac, or specifically for a recursive one? – Roy2012 Jun 17 '20 at 10:25
  • @Roy2012 Actually maybe my example above was not the right one for asking my question. My question was targeted at "how does a generator avoid stack overflow, or: how does `yield from` (a recursive use of generators) avoid stack overflow in processing huge nested json files. – Ugur Jun 17 '20 at 10:36
  • 1
    @Ugur see my answer. Generators simply don't use recursion. It's all syntactic sugar around an iterator object. You store your state internally, "yield" returns a value and the computation stops in place until next() is called again. It's just special syntax for an iterator that stores the current state and computes the next state on demand. No recursion there. – Benjamin Maurer Jun 17 '20 at 10:41
  • @BenjaminMaurer Thanks for the detailed answer below. - But what about `yield from my_function(...)`? Isn't this recursion either? – Ugur Jun 17 '20 at 10:43
  • 1
    Your generator just doesn't recurse deeply enough to overflow the stack. That has nothing to do with the fact that it's a generator - an ordinary function wouldn't have overflowed the stack either. – user2357112 supports Monica Jun 17 '20 at 10:44
  • OK, so with your clarifications I've rewritten my answer. I hope that helps. I put way too much time into this ^^ I've got to go. – Benjamin Maurer Jun 17 '20 at 11:24

1 Answers1

1

OK, after your comments I have completely rewritten my answer.

  1. How does recursion work and why do we get a stack overflow?

Recursion is often an elegant way to solve a problem. In most programming languages, every time you call a function, all the information and state needed for the function a put on the stack - a so called "stack frame". The stack is a special per-thread memory region and limited in size.

Now recursive functions implicitly use these stack frames to store state/intermediate results. E.g., the factorial function is n * (n-1) * ((n-1) -1)... 1 and all these "n-1" are stored on the stack.

An iterative solution has to store these intermediate results explicitly in a variable (that often sits in a single stack frame).

  1. How do generators avoid stack overflow?

Simply: They are not recursive. They are implemented like iterator objects. They store the current state of the computation and return a new result every time you request it (implicitly or with next()).

If it looks recursive, that's just syntactic sugar. "Yield" is not like return. It yields the current value and then "pauses" the computation. That's all wrapped up in one object and not in a gazillion stack frames.

This will give you a series from ´1 to n!´:

def fac(n):
    if (n <= 0):
        yield 1
    else:
        v = 1
        for i in range(1, n+1):
            v = v * i
            yield v

There is no recursion, the intermediate results are stored in v which is most likely stored in one object (on the heap, probably).

  1. What about yield from

OK, that's interesting, since that was only added in Python 3.3. yield from can be used to delegate to another generator.

You gave an example like:

def process_json(json_input, folder_path=""):
    # Some code
    yield from process_json(json_input['children'], folder_path)

This looks recursive, but instead it's a combination of two generator objects. You have your "inner" generator (which only uses the space of one object) and with yield from you say "I'd like to forward all the values from that generator to my caller".

So it doesn't generate one stack frame per generator result, instead it creates one object per generator used.

In this example, you are creating one generator object per child JSON-object. That would probably be the same number of stack frames needed if you did it recursively. You won't see a stack overflow though, because objects are allocated on the heap and you have a very different size limit there - depending on your operating system and settings. On my laptop, using Ubuntu Linux, ulimit -s gives me 8 MB for the default stack size, while my process memory size is unlimited (although I have only 8GB of physical memory).

Look at this documentation page on generators: https://wiki.python.org/moin/Generators

And this QA: Understanding generators in Python

Some nice examples, also for yield from: https://www.python-course.eu/python3_generators.php

TL;DR: Generators are objects, they don't use recursion. Not even yield from, which just delegates to another generator object. Recursion is only practical when the number of calls is bounded and small, or your compiler supports tail call optimization.

Benjamin Maurer
  • 2,627
  • 2
  • 20
  • 43