Creating a generator for primes in the style of SICP

Question

In a future course, I'll be having a discipline that uses Python with an emphasis of using sequences and generators and that kind of stuff inn Python.

I've been following an exercise list to exercise these parts. I'm stuck on an exercise that asks for a prime generator. Up until now, I haven't used Python very much, but I've read and done most of the exercises in SICP. There, they present the following program that makes use of the sieve of Eratosthenes to generate a lazy list of primes.

(define (sieve stream)
  (cons-stream
   (stream-car stream)
   (sieve (stream-filter
           (lambda (x)
             (not (divisible? x (stream-car stream))))
           (stream-cdr stream)))))

(define primes (sieve (integers-starting-from 2)))

In Python, from what I have read, the closest thing is generators So I tried translating it to the following.

import itertools
def sieve(seq):
    n = next(seq)
    yield n
    sieve(filter(lambda x: x % n != 0, seq))

def primes():
    return sieve(itertools.count(2))

print(list(itertools.islice(primes(),10)))

But it prints only [2]. I figure that this is because the result of the recursive call to sieve is just discarded, instead of running the function again, as I first expected.

To try to remedy this, I tried using a loop instead:

def sieve(seq):
    def divisible(n):
        return lambda x: x % n != 0
    while True:
        n = next(seq)
        yield n
        seq = sieve(filter(divisible(n), seq))

This works insofar as that I can generate the first 9 primes, but if I ask for the tenth a RecursionError is raised.

So, my question is how can I improve this to be able to calculate larger primes?

PS: There is already a proposed implementation of a sieve generator in https://stackoverflow.com/a/568618/6571467, but it explicitly deals with the previous primes in the sieve. Whereas in the lazy list paradigm, the objective is to abstract from the order the operations are actually executed.

The lazy paradigm relies on tail-call optimization (TCO) to avoid growing the call stack every time you try to add another filter to the stream of candidate. Python, unfortunately, doesn't implement TCO. (Effectively, the call stack itself is what stores the previous primes, and Python doesn't allow it to grow without limit.) — chepner, Jul 04 '20 at 15:52
Be aware that despite common misconceptions, that's not actually a sieve of Eratosthenes. [It's a highly inefficient form of trial division.](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) — user2357112 supports Monica, Jul 04 '20 at 21:59
In contrast, the sieve in [your link](https://stackoverflow.com/a/568618) is an actual lazy sieve. — user2357112 supports Monica, Jul 04 '20 at 22:04
user2357112 supports Monica Thank you for the paper. I hadn't realised that the filtering done in SICP was fundamentally different from the crossing of of multiples in the original sieve. Although I do not understand the code shown, since I never learnt Haskell and can't figure out what they do. — tigre200, Jul 05 '20 at 14:02

tobias_k · Accepted Answer · 2020-07-05T14:54:28.257

For you first version, you can just use yield from to yield from the recursive call:

def sieve(seq):
    n = next(seq)
    yield n
    yield from sieve(filter(lambda x: x % n != 0, seq))

(or for x in sieve(...): yield x for older versions of Python)

For your looped version, remove the recursive call, just stack the filters:

def sieve(seq):
    while True:
        n = next(seq)
        yield n
        seq = filter(lambda x, n=n: x % n != 0, seq)

Both versions will work for the first (almost) 1000 primes before also resulting in a maximum-recursion error (even with the loop, as you have a bunch of nested filter functions), which can be postponed by setting a higher recursion limit, but not really prevented -- except by not using recursion, or a language that supports Tail Call Optimization.

Alternatively, for a purely iterative solution, you can store the set of seen primes and check whether any of those is a divisor. (Both recursive variants basically also store this set of primes, except they hide it in the stack of nested filter calls.)

def sieve(seq):
    ps = []
    while True:
        n = next(seq)
        if not any(n % p == 0 for p in takewhile(lambda p: p*p <= n, ps)):
            yield n
            ps.append(n)

All three versions yield the same results, but the "recursion-less" are (much) faster:

>>> %timeit primes(sieve1, 900)
1 loop, best of 5: 297 ms per loop

>>> %timeit primes(sieve2, 900)
1 loop, best of 5: 185 ms per loop

>>> %timeit primes(sieve3, 900)
10 loops, best of 5: 35.4 ms per loop

(Using n.__rmod__ instead of lambda x: x % n != 0 gives a nice boost to the filter-based ones, but they are still much slower.)

Addendum, about your second approach resulting in a recursion error even for very small valus: I am still having trouble wrapping my head around this, but here is how I understand it: By doing seq = sieve(filter(nondivisible(n), seq)) instead of just seq = filter(nondivisible(n), seq), the "top-level" sieve gets the next value from the sieve one level below, and so forth, and each of those adds another layer of sieves in each iteration, causing the height of the sieve-stack to double in each iteration.

It's better to use a list than a set for the seen primes. Deduplication is unnecessary, list iteration is faster, and the list makes sure smaller primes are tested first. — user2357112 supports Monica, Jul 04 '20 at 22:08
@user2357112supportsMonica Agreed, and you can use `takewhile` for early-abort. — tobias_k, Jul 04 '20 at 22:09
It's weird that your second version runs so much farther than my last, since they are mostly the same (both are a function with a captured n in their environment). Coming from Scheme these are the kinds of things that baffle me in python. — tigre200, Jul 05 '20 at 14:08
@tigre200 I think the recursion in the loop actually causes the number of stacked sieves to _double_ with each prime, see my edit. — tobias_k, Jul 05 '20 at 14:55

cbare · Answer 2 · 2020-07-04T21:42:11.810

In the Python solution, sieve will be a function that takes a generator and is itself a generator, something like the following:

from itertools import count, islice

def sieve(gen):
    for n in gen:
        if n%2 == 0:
            yield n

def evens(start=2, limit=None):
    return islice(sieve(count(start)), limit)

for n in evens(limit=10):
    print(n)

2
4
6
8
10
12
14
16
18
20

...except that to generate primes we want to check not just divisibility by 2, but divisibility by everything less than n at every given point. We can do better by realizing that for every pair of divisors, a • b = n, one of those divisors is ≤ sqrt(n).

Python is different from Scheme in that Python discourages recursion with a smallish limit on recursion depth. See this question.

So, looping inside of sieve is a good thought.

from itertools import count, islice
from math import sqrt

def sieve(gen):
    for n in gen:
        sqrt_n = int(sqrt(n))
        for d in count(2):
            if d > sqrt_n:
                yield n
                break
            if n % d == 0:
                break

def primes(start=2, limit=None):
    return islice(sieve(count(start)), limit)

for p in primes(limit=20):
    print(p)

We might take advantage of the stateful nature of generators. We can keep a list of previously generated primes inside the generator, trading off space for faster running time.

def primes():
    primes = [2]
    yield 2
    for n in count(3):
        sqrt_n = int(sqrt(n))
        for p in primes:
            if p > sqrt_n:
                primes.append(n)
                yield n
                break
            if n % p == 0:
                break

This allows tolerable generation of up to moderately large primes:

In [56]: %time nth(primes(), 1000000)
CPU times: user 40.9 s, sys: 142 ms, total: 41.1 s
Wall time: 41.2 s
Out[56]: 15485863

For a really smart implementation see Simple Prime Generator in Python.

That doesn't really answer my question, since I already acknowledged the Simple Prime Generator in Python implementation. Keeping a list of primes works, but I want to explore the lazy paradigm of not keeping such bookkeeping information. I fear that chepner might be right that python stack limitations preclude this approach, but perhaps someone has a clever solution. — tigre200, Jul 04 '20 at 19:13
@tigre200: Your approach still keeps that bookkeeping information, though. It just hides it in the call stack. — user2357112 supports Monica, Jul 04 '20 at 22:01

Creating a generator for primes in the style of SICP

2 Answers2