How to implement an efficient infinite generator of prime numbers in Python?

Question

This is not a homework, I am just curious.

INFINITE is the key word here.

I wish to use it as for p in primes(). I believe that this is a built-in function in Haskell.

So, the answer cannot be as naive as "Just do a Sieve".

First of all, you do not know how many consecutive primes will be consumed. Well, suppose you could concoct 100 of them at a time. Would you use the same Sieve approach as well as the frequency of prime numbers formula?

I prefer non-concurrent approach.

Thank you for reading (and writing ;) )!

Built in Function in Haskell? which module? – st0le Oct 07 '10 at 09:36 — st0le, Oct 07 '10 at 09:36
For wich problems do you need a `for p in primes()` loop? – miracle173 Nov 29 '14 at 07:02 — miracle173, Nov 29 '14 at 07:02

tzot · Answer 1 · 2010-09-26T04:12:23.757

“If I have seen further…”

The erat2 function from the cookbook can be further sped up (by about 20-25%):

erat2a

import itertools as it
def erat2a( ):
    D = {  }
    yield 2
    for q in it.islice(it.count(3), 0, None, 2):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            # old code here:
            # x = p + q
            # while x in D or not (x&1):
            #     x += p
            # changed into:
            x = q + 2*p
            while x in D:
                x += 2*p
            D[x] = p

The not (x&1) check verifies that x is odd. However, since both q and p are odd, by adding 2*p half of the steps are avoided along with the test for oddity.

erat3

If one doesn't mind a little extra fanciness, erat2 can be sped up by 35-40% with the following changes (NB: needs Python 2.7+ or Python 3+ because of the itertools.compress function):

import itertools as it
def erat3( ):
    D = { 9: 3, 25: 5 }
    yield 2
    yield 3
    yield 5
    MASK= 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0,
    MODULOS= frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

    for q in it.compress(
            it.islice(it.count(7), 0, None, 2),
            it.cycle(MASK)):
        p = D.pop(q, None)
        if p is None:
            D[q*q] = q
            yield q
        else:
            x = q + 2*p
            while x in D or (x%30) not in MODULOS:
                x += 2*p
            D[x] = p

The erat3 function takes advantage of the fact that all primes (except for 2, 3, 5) modulo 30 result to only eight numbers: the ones included in the MODULOS frozenset. Thus, after yielding the initial three primes, we start from 7 and work only with the candidates.
The candidate filtering uses the itertools.compress function; the “magic” is in the MASK sequence; MASK has 15 elements (there are 15 odd numbers in every 30 numbers, as chosen by the itertools.islice function) with a 1 for every possible candidate, starting from 7. The cycle repeats as specified by the itertools.cycle function.
The introduction of the candidate filtering needs another modification: the or (x%30) not in MODULOS check. The erat2 algorithm processed all odd numbers; now that the erat3 algorithm processes only r30 candidates, we need to make sure that all D.keys() can only be such —false— candidates.

Benchmarks

Results

On an Atom 330 Ubuntu 9.10 server, versions 2.6.4 and 3.1.1+:

$ testit
up to 8192
==== python2 erat2 ====
100 loops, best of 3: 18.6 msec per loop
==== python2 erat2a ====
100 loops, best of 3: 14.5 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
100 loops, best of 3: 19.2 msec per loop
==== python3 erat2a ====
100 loops, best of 3: 14.1 msec per loop
==== python3 erat3 ====
100 loops, best of 3: 11.7 msec per loop

On an AMD Geode LX Gentoo home server, Python 2.6.5 and 3.1.2:

$ testit
up to 8192
==== python2 erat2 ====
10 loops, best of 3: 104 msec per loop
==== python2 erat2a ====
10 loops, best of 3: 81 msec per loop
==== python2 erat3 ====
Traceback (most recent call last):
…
AttributeError: 'module' object has no attribute 'compress'
==== python3 erat2 ====
10 loops, best of 3: 116 msec per loop
==== python3 erat2a ====
10 loops, best of 3: 82 msec per loop
==== python3 erat3 ====
10 loops, best of 3: 66 msec per loop

Benchmark code

A primegen.py module contains the erat2, erat2a and erat3 functions. Here follows the testing script:

#!/bin/sh
max_num=${1:-8192}
echo up to $max_num
for python_version in python2 python3
do
    for function in erat2 erat2a erat3
    do
        echo "==== $python_version $function ===="
        $python_version -O -m timeit -c \
        -s  "import itertools as it, functools as ft, operator as op, primegen; cmp= ft.partial(op.ge, $max_num)" \
            "next(it.dropwhile(cmp, primegen.$function()))"
    done
done

This is an impressive, albeit belated answer. I would encourage others to up-vote as well. — Hamish Grubijan, Sep 27 '10 at 14:28
Thanks; I usually catch up from the RSS feed, and I see questions about 3-4 weeks later :) — tzot, Sep 27 '10 at 20:51
BTW the function `erat2a` here is almost exactly the version by Tim Hochberg from [ActiveState recipes](http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/), dated Feb 2006, except it counts up by itself from 3, with a `while True` loop. — Will Ness, May 22 '12 at 07:11
@WillNess: of course `erat2a` is almost exactly the same as `erat2` from the cookbook; Alex Martelli mentioned the cookbook method (“by himself and many others”, which was written around 2001-2002 IIRC) and I suggested speed improvements. Either your comment says what I basically say in the first line of my answer, or you meant something else and I missed your point. — tzot, May 22 '12 at 18:44
@WillNess: oh, now I see your point (which I missed :). Yes, both answers have the same speed-up, but it's a coincidence. And thanks to you, I saw the new interface (probably licensed the app from stackexchange). Re-visited my old account there too; first contribution was 10 years ago, last one 5 years ago. Time flies like an arrow (but fruit flies like a banana :). — tzot, May 22 '12 at 19:11

score 72 · Answer 2 · edited May 23 '17 at 12:10

72

Since the OP asks for an efficient implementation, here's a significant improvement to the active state 2002 code by David Eppstein/Alex Martelli (seen here in his answer): don't record a prime's info in the dictionary until its square is seen among the candidates. Brings space complexity down to below O(sqrt(n)) instead of O(n), for n primes produced ( π(sqrt(n log n)) ~ 2 sqrt(n log n) / log(n log n) ~ 2 sqrt(n / log n) ). Consequently, time complexity is also improved, i.e. it runs faster.

Creates a "sliding sieve" as a dictionary of current multiples of each base prime (i.e. below the sqrt of the current production point), together with their step values:

from itertools import count
                                         # ideone.com/aVndFM
def postponed_sieve():                   # postponed sieve, by Will Ness      
    yield 2; yield 3; yield 5; yield 7;  # original code David Eppstein, 
    sieve = {}                           #   Alex Martelli, ActiveState Recipe 2002
    ps = postponed_sieve()               # a separate base Primes Supply:
    p = next(ps) and next(ps)            # (3) a Prime to add to dict
    q = p*p                              # (9) its sQuare 
    for c in count(9,2):                 # the Candidate
        if c in sieve:               # c's a multiple of some base prime
            s = sieve.pop(c)         #     i.e. a composite ; or
        elif c < q:  
             yield c                 # a prime
             continue              
        else:   # (c==q):            # or the next base prime's square:
            s=count(q+2*p,2*p)       #    (9+6, by 6 : 15,21,27,33,...)
            p=next(ps)               #    (5)
            q=p*p                    #    (25)
        for m in s:                  # the next multiple 
            if m not in sieve:       # no duplicates
                break
        sieve[m] = s                 # original test entry: ideone.com/WFv4f

(the older, original code here was edited to incorporate changes as seen in the answer by Tim Peters, below). see also this for a related discussion.

Similar 2-3-5-7 wheel-based code runs ~ 2.15x faster (which is very close to the theoretical improvement of 3/2 * 5/4 * 7/6 = 2.1875).

edited May 23 '17 at 12:10

Community

1
1

answered May 24 '12 at 08:16

Will Ness

62,652
8
86
167

So, why stop hard-coding at prime = 9 and not at ... say 97? – Hamish Grubijan May 24 '12 at 18:27
2

9 of course is not a prime; but it *is* entirely arbitrary here where to start, as long as the initial state of the dict D is consistent with the starting candidate. The absolute minimum is yielding 3 and starting from c=5; I just wanted to delay the recursive call into `postponed_sieve()` in line # 5 for a little bit longer. – Will Ness May 24 '12 at 21:49
3

FYI This is not only very fast, but also very memory-efficient. For example, to find the first 1 million primes, the number of entries it puts to the 2 dicts it uses are 545 and 17. This is the best inplementation posted so far. – pts Aug 02 '12 at 14:18
yes, that was the whole intention of it, to bring down the memory complexity of O(n) to O(sqrt(n)). it is better algorithmically; you probably could have more tweaking done on it to get some further improvement. – Will Ness Aug 03 '12 at 07:48
Won't it run into infinite recursion in this line: `ps = (p for p in postponed_sieve())` ? – ovgolovin Aug 23 '12 at 18:07
@ovgolovin I think I tried it, and it worked. Generator expressions are lazy. That line just defines it; then only first two are demanded. But at the very start I yield four primes, unconditionally. So there's no chasing. There's also an [Ideone entry](http://ideone.com/WFv4f) with similar code. (using non-postponed sieve for the primes supply). – Will Ness Aug 23 '12 at 21:15
@ovgolovin BTW in Haskell the gains are much more profound. Python's dicts are extremely efficient (here). But with your [merging iterators](http://stackoverflow.com/questions/12095846/merged-iterators-produce-obscure-results), you hit SO at 1000 primes - because 999 mergings were performed. Instead of just 24 that are really needed. – Will Ness Aug 23 '12 at 21:32
Thanks! I'm trying to understand you code. In vain so far. I hope I'll crack how it works eventually. You code with recursive `postponed_sieve` works as well. – ovgolovin Aug 24 '12 at 11:33
with lazy evaluation (and generators are lazy) there is no recursive call, just another definition. Another *thing* gets created in memory, ready to serve *next* requests by *yielding* to them, value by value. So there are two; one feeds the other; the top one feeds the top-level "print" or whatever and pulls primes from the lower one, when it (the top one) needs new prime - and that is when its square has arrived in the "input", `c = c+2`. – Will Ness Aug 24 '12 at 11:56
1

do you get how the non-postponed [active state recipe](http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/) code by David Eppstein works? Start with that one. It's [right here](http://stackoverflow.com/a/2212007/849891) on this very page, too. – Will Ness Aug 24 '12 at 12:10
2

Thanks! I think I eventually got how it works! Here's the code with debug output for those who will struggle to understand it: http://ideone.com/n5PGu. And I only understood it when drew the generated primes on paper in colored pens. :o) – ovgolovin Aug 31 '12 at 22:38
I have one more question if you don't mind. The subiterator do the same work as the first iterator has already done. Why don't you just save the output of the first iterator to the list and iterator over those values in the subiterator. I mean to create list `primes = []` and then apart from yielding append values to `primes` and make subiterator iterate over the saved values `ps = (p for p in iter(primes))`. See **primes5**: http://ideone.com/QV4EQ (timings are not correct because of extensive `print` statements) – ovgolovin Aug 31 '12 at 23:14
I made a decent timing testings. The results: http://ideone.com/xeke4 Verstion with memorization values is a bit slower that you version with recursive generator calls. I think it's because appending to list is quite consuming. – ovgolovin Aug 31 '12 at 23:24
More timings: http://ideone.com/z9dHG (Can't increase `n` in *ideone* because of limit on calculation time there; just copy the code to local Python 3 interpreter and execute with bigger constants in `main2`). – ovgolovin Sep 01 '12 at 01:06
Time complexity is `O(n)`. But **constants** are much lower than in not postponed sieve. (20,000,000 prime numbers are generated on my machine in 90 sec, using PyPy. Great!) – ovgolovin Sep 01 '12 at 01:47
@ovgolovin I just needed a separate primes supply - so used what I already had - the generator itself (its another copy that is - but no need to write any new code). Just wanted something - anything - to do the job. :) Word of advice - first convince yourself that a piece of code is working, fulfills the laws that you set for it - then forget about its inner workings, treat is as a black box from then on. E.g. adding to dictionary in this code. No need to revisit how it's done. We just know it holds multiples, 1 for each prime lower than current sqrt.:) – Will Ness Sep 01 '12 at 13:13
@ovgolovin re using lists - but *from where* will you take the primes to append to the list's end? *This* generation is far gone already: it's somewhere around 121 when we need to add 11. The whole thing is, we want to be able to forget the produced prime as soon as it is produced. --- re time complexity: the theoretical lower limit is ` ~ n log(n) log(log(n))`, in `n` primes produced. Check this out: http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth . – Will Ness Sep 01 '12 at 13:17
2

@WillNess Yeah. Version with `primes` list hold all the primes generated. I though it would get rid us from redundant work of primes subgenerator. But saving those values is even slower than running subgenerator, let alone all the saved values consume memory. – ovgolovin Sep 01 '12 at 13:19
1

@ovgolovin throwing away all the primes is the whole point of this whole operation. :) :) When the dict is blown up in size its performance deteriorates. The inner sub-generators do the minimal possible work to maintain the *tower* of sub-generators there. BTW in my ideone code there's no tower - I use non-postponed sieve for the inner sub-generator. But letting this postponed guy take care of itself automagically was fun too. Just produce those 2-3-5-7 first and all is OK. :) – Will Ness Sep 01 '12 at 13:22
I like your solution so much, I wish I could do a (lazy! :) +∞ – tzot May 02 '13 at 06:41
2

@WillNess: As an exercise, I tried to implement your solution in Swift, and presented it on Code Review: http://codereview.stackexchange.com/questions/136332/postponed-prime-sieve-in-swift. – Martin R Jul 29 '16 at 20:13

score 44 · Answer 3 · edited Dec 20 '20 at 15:06

44

For posterity, here's a rewrite of Will Ness's beautiful algorithm for Python 3. Some changes are needed (iterators no longer have .next() methods, but there's a new next() builtin function). Other changes are for fun (using the new yield from <iterable> replaces four yield statements in the original. More are for readability (I'm not a fan of overusing ;-) 1-letter variable names).

It's significantly faster than the original, but not for algorithmic reasons. The speedup is mostly due to removing the original's add() function, doing that inline instead.

def psieve():
    import itertools
    yield from (2, 3, 5, 7)
    D = {}
    ps = psieve()
    next(ps)
    p = next(ps)
    assert p == 3
    psq = p*p
    for i in itertools.count(9, 2):
        if i in D:      # composite
            step = D.pop(i)
        elif i < psq:   # prime
            yield i
            continue
        else:           # composite, = p*p
            assert i == psq
            step = 2*p
            p = next(ps)
            psq = p*p
        i += step
        while i in D:
            i += step
        D[i] = step

edited Dec 20 '20 at 15:06

Will Ness

62,652
8
86
167

answered Oct 15 '13 at 21:08

Tim Peters

55,793
10
105
118

1

did I say thank you? I should've, when I upvoted (back in April, as SO says to me). :) This is very nice, esp. the asserts. Of course the core beauty is by the initial author(s). – Will Ness Oct 20 '14 at 13:17
4

Au contraire, thank you, Will! I'm one of the co-authors on the ActiveState recipe - we all had fun working out the original algorithm on comp.lang.python. It gave a nice algorithm. But none of us had the insight you added, to delay adding multiples to the dict before they're really needed. That's very pretty, and of real practical benefit - thanks! – Tim Peters Oct 21 '14 at 03:50
How quickly does this work compared to the original+2,3,5,7 sieve? – Oscar Smith Nov 01 '16 at 23:53
I've edited to add the link to the mentioned answer, to make it easier to find. – Will Ness Dec 20 '20 at 15:07

Dominic Bou-Samra · Answer 4 · 2010-02-06T04:43:29.110

8

This isn't originally my code, however, it's worth posting. The original can be found here: http://code.activestate.com/recipes/117119/

def gen_primes():
  D = {}
  q = 2  # first integer to test for primality.

  while True:
    if q not in D:
      # not marked composite, must be prime  
      yield q 

      #first multiple of q not already marked
      D[q * q] = [q] 
    else:
      for p in D[q]:
        D.setdefault(p + q, []).append(p)
      # no longer need D[q], free memory
      del D[q]

    q += 1

It's a generator, so use it like any other.

primes = gen_primes()
for p in primes:
  print p

It takes 1.62s to generate and put into a set, 1 million primes, on my desktop.

edited Feb 06 '10 at 04:43

answered Feb 06 '10 at 04:09

Dominic Bou-Samra

13,072
26
92
146

11

How does it scale? Please paste the first trillion primes here please. – Beska Feb 06 '10 at 04:43
1

@Beska: I'm more interested in the primes between two trillion and three trillion myself. Who would like to check em for me? – D.Shawley Feb 06 '10 at 04:49
1

10 million primes takes 19.26s. 100 million primes took 293.61s. Can someone else check - I might be doing it wrong :S – Dominic Bou-Samra Feb 06 '10 at 04:56
1

Does anyone else here get the feeling that something fishy is going on here? "Post the primes man...it's cool...I don't want any trouble...just post the primes man..." – Beska Feb 06 '10 at 05:05
1

@Hamish: why don't you just run it yourself, take the primes and look at them at your leisure? (Rather than clogging up this question/answer with an enormous amount of senseless data.) – Beska Feb 06 '10 at 05:06
@Beska, do you mind giving my question right before this one a shot? Thank you! – Hamish Grubijan Feb 06 '10 at 05:09
you have one hell of a fast desktop, [apparently more than 20x faster](https://ideone.com/V1rbwh) than Ideone (usually it is around 7x, I thought... maybe for Python it's different). – Will Ness Dec 03 '14 at 15:58

starblue · Answer 5 · 2012-05-24T07:39:50.827

5

Do a segmented sieve, where the size of a segment is determined by available memory or the maximal size of a bitset.

For each segment represent the numbers in some interval [n; n + segment_size) as a bit set and sieve with all prime numbers below the square root of the upper bound.

Using a bit set uses less memory than a hash table or tree data structure, because you are working with dense sets of numbers.

edited May 24 '12 at 07:39

answered Feb 06 '10 at 10:47

starblue

51,675
14
88
146

My implementation does something like a segmented sieve, but it uses two heaps instead of bitsets. http://stackoverflow.com/a/11759904/97248 – pts Aug 01 '12 at 13:07

Jason · Answer 6 · 2016-04-13T18:30:04.433

Here's a pretty fast infinite generator, written in Python2 but easily adjusted to Python3. To use it to add the primes up to 10**9, use the following:

from itertools import takewhile
from functools import partial
from operator import gt
print (sum(takewhile(partial(gt, 10**9), prime_gen_inf())))

It's a segmented sieve, faster but obviously less elegant than Will Ness's algorithm.

from operator import mul
from functools import reduce
def prod(x): return reduce(mul, x, 1)


def build_sieve(wheel):
    w = prod(wheel)
    w_phi = prod([p-1 for p in wheel])
    rems = [a for a in range(w) if all(a % p for p in wheel)]
    assert len(rems) == w_phi
    inv = {a:pow(a, w_phi - 1, w) for a in rems}
    try:
        known_p = wheel + rems[1 : rems.index(rems[1]*rems[1])]
    except ValueError:
        known_p = wheel + rems[1:]
    return wheel, w, w_phi, rems, inv, known_p

#Adjust the chunk variable based on your computer's architecture.
#
#Adjust the line with #! if you don't need "true" infinite.  If you don't need
#primes larger than 1<<32, use array('H', []), if 1<<64 use 'L', if 1<<128 (in
#Python3) use 'Q', otherwise use empty list [].
#To save memory, comment out the lines with #*, and uncomment the commented-out
#lines 
import itertools
from itertools import islice, count, compress, izip
chain_f = itertools.chain.from_iterable
from array import array
def prime_gen_inf(chunk=250000, sieve_info = build_sieve([2,3,5,7])):
    """    Indefinitely yields primes    """
    wheel, w, w_phi, rems, inv, known_p = sieve_info
    for p in known_p: yield p
    new_n = 0;
    while True:
        size = min(chunk, (p * p - new_n) / w)
        sieve = bytearray([1]) * size * w_phi
        n, new_n = new_n, new_n + size * w
        if not n:
            zero = bytearray([0])
            seen = len(known_p) - len(wheel) + 1
            sieve[:seen:1] = zero * seen
            p_gen = islice(prime_gen_inf(), len(wheel), None)
            new_p = next(p_gen)
            ps = []                                         #! array('H', [])
            p_invs = bytearray([])                                         #*
        while new_p * new_p < new_n:
            ps.append(new_p)
            p_invs.append(inv[new_p % w])                                  #*
            new_p = next(p_gen)
        for p, p_inv, modp in izip(ps, p_invs, [-n % p for p in ps]):      #*
            s = [(modp + p * (p_inv * (r - modp) % w)) / w for r in rems]  #*
        #for p in ps:
        #    s = [(-n%p + p * (inv[p%w] * (r - -n%p) % w)) / w for r in rems]
            for i, start in enumerate(s):
                slice_size = ((size - start - 1) / p + 1)
                sieve[i + start * w_phi :: p * w_phi] = zero * slice_size
        for p in compress(chain_f(izip(*[count(n+r, w) for r in rems])), sieve):
            yield p

tzot · Answer 7 · 2012-08-01T12:09:35.817

2

And another answer, more memory-efficient than my erat3 answer here:

import heapq

def heapprimegen():
    hp= []
    yield 2
    yield 3
    cn= 3
    nn, inc= 3, 6
    while 1:
        while cn < nn:
            yield cn
            heapq.heappush(hp, (3*cn, 2*cn))
            cn+= 2
        cn= nn+2
        nn, inc= heapq.heappushpop(hp, (nn+inc, inc))

It maintains a heap (a list) of prime multiples rather than a dictionary. It loses some speed, obviously.

edited Aug 01 '12 at 12:09

answered Dec 05 '11 at 13:44

tzot

81,264
25
129
197

`yield 3` is missing from here. – pts Aug 01 '12 at 11:12

score 2 · Answer 8 · answered Feb 05 '12 at 18:42

2

Another way to do it:

import itertools
def primeseq():
    prime = [2]
    num = 0
    yield 2
    for i in itertools.count(3, 2):
        is_prime = True
        for num in prime:
            if i % num == 0:
                is_prime = False
                break
            elif num ** 2 > i: 
                break
        if is_prime:
            prime.append(i)
            yield i

answered Feb 05 '12 at 18:42

quantum

3,292
25
48

1

this is optimal trial division algorithm. – Will Ness Aug 02 '12 at 11:19

pts · Answer 9 · 2012-08-02T14:11:48.890

Here is a simple but not terribly slow one using a heap instead of a dict:

import heapq

def heap_prime_gen_squares(): 
    yield 2  
    yield 3  
    h = [(9, 6)]
    n = 5
    while True:
        a, b = h[0]
        while n < a:
            yield n
            heapq.heappush(h, (n * n, n << 1))
            n += 2
        heapq.heapreplace(h, (a + b, b))  # Replace h[0], which is still (a, b).

My speed measurements of user time for the first 1 million primes (smaller numbers are better):

postponed_sieve (dict-based): 8.553s
erat2b (dict-based): 9.513s
erat2a (dict-based): 10.313s
heap_prime_gen_smallmem (heap-based): 23.935s
heap_prime_gen_squares (heap-based): 27.302s
heapprimegen (dict-based): 145.029s

So dict-based approaches seem to be the fastest.

score 1 · Answer 10 · answered Aug 01 '12 at 13:03

Here is a complicated heap-based implementation, which is not much faster than other heap-based implementations (see the speed comparison in another answer of mine), but it uses much less memory.

This implementation uses two heaps (tu and wv), which contain the same number elements. Each element is an int pair. In order to find all primes up to q**2 (where q is a prime), each heap will contain at most 2*pi(q-1) elements, where pi(x) is the number of positive primes not larger than x. So the total number of integers is at most 4*pi(floor(sqrt(n))). (We could gain a factor on 2 on memory by pushing half as much stuff to the heap, but that would make the algorithm slower.)

Other dict and heap-based approaches (e.g. erat2b, and heap_prime_gen_squares and heapprimegen) above store about `2*pi(n)' integers, because they extend their heap or dict every time they find a prime. As a comparison: to find the 1_000_000 primes, this implementation stores less than 4141 integers, other implementations store more than 1_000_000 integers.

import heapq

def heap_prime_gen_smallmem():
    yield 2
    yield 3
    f = 5
    fmar3 = 2
    q = 7
    q6 = 7 * 6
    qmar3 = 4
    tu = [(25, 30), (35, 30)]
    vw = [(25, 30), (35, 30)]
    while True:
        qmar3 += 2   
        if qmar3 == 6:  
            qb = q + 4
            q6b = q6 + 24
            qmar3 = 2
        else:
            qb = q + 2
            q6b = q6 + 12
        if q < tu[0][0]:
            d = q * q
            while f < d:
                a, b = vw[0]
                if f < a: 
                    yield f   
                else:
                    a, b = vw[0]
                    heapq.heapreplace(vw, (a + b, b))
                    a, b = vw[0]
                    while f >= a:
                        heapq.heapreplace(vw, (a + b, b))
                        a, b = vw[0]   
                fmar3 += 2
                if fmar3 == 6:
                    f += 4
                    fmar3 = 2
                else:
                    f += 2
            c = q * qb   
            heapq.heappush(tu, (d, q6))
            heapq.heappush(tu, (c, q6))
            heapq.heappush(vw, (d, q6))
            heapq.heappush(vw, (c, q6))
        else:
            a, b = tu[0]
            heapq.heapreplace(tu, (a + b, b))
            a, b = tu[0]  
            while q >= a:
                heapq.heapreplace(tu, (a + b, b))
                a, b = tu[0]
        q = qb
        q6 = q6b

score 0 · Answer 11 · answered Feb 06 '10 at 04:22

0

Here's a generator that's a little truer to how it's done in Haskell: filtering against composites of known primes, then adding the remaining primes to the list.

def gen_primes():
    primes = []
    i = 2
    while True:
        prime = True
        for p in primes:
            if not (i % p):
                prime = False
                break
        if prime:
            yield i
            primes.append(i)
        i += 1

answered Feb 06 '10 at 04:22

avpx

1,822
15
13

2

This isn't necessarily efficient, but it's much like the Haskell one-line implementation of the Sieve of Eratosthenes. It is my code, and I just wrote it, so it may not work exactly as intended, but a quick test of it *does* generate the right sequence of primes. – avpx Feb 06 '10 at 04:24
It did hang for me. What is the code to generate the first 100? – Hamish Grubijan Feb 06 '10 at 04:30
1

That's weird. Works fine for me. Try this: `primes = gen_primes()` and then `for i in xrange(100): print primes.next()` – avpx Feb 06 '10 at 05:30
This is similar to the [answer posted later by quantum](https://stackoverflow.com/a/9151909), except this code tests each candidate `i` against every prime seen so far. It should break out of the inner loop when `p*p > i`. – PM 2Ring Dec 01 '20 at 06:26

score 0 · Answer 12 · answered Feb 06 '10 at 12:43

0

I wrote an article about an infinite primes generator some times ago:

http://stacktrace.it/2008/01/progetto-eulero-problema-3/

It's in Italian but you may have a pesky translation using Google: http://tinyurl.com/yzpyeom

answered Feb 06 '10 at 12:43

piro

11,767
4
32
35

1

A quick comment on your article: prime factoring can be done without needing to generate any primes ahead of time: using another "every schoolkid knows" technique, starting with 2, divide out *all* 2's from N, then all 3's, 5's, 7's, 9's, etc. (just odds since all 2's were already divided out). I did this problem too on the euler site and used a prime generating sieve, but it's completely unnecessary. With this technique, any number you find which divides N is prime (a prime factor). – Bogatyr Oct 21 '11 at 16:55

score 0 · Answer 13 · answered May 17 '18 at 19:33

I know the post is old, but I came by myself across this question... The following code is based on a very simple idea: a growing sieve of Eratosthenes. This solution is really slower than the best ones here, but it is easy to grasp and designed to be readable...

I used integers to store the results of the sieve. In binary format, an integer is a list of 0s and 1s, 0 at position i if i is not a prime, 1 if it may be a prime. The requisite infinity is a result of the fact that Python 3 integers are unbounded.

def primes():
    container, size = 1 << 2, 3 # we start with 0b100 (from right to left: 0 and 1 are not primes, 2 is
    last_prime = 1
    while True:
        prime = next((j for j in range(last_prime+1, size) if container & 1 << j), None) # find the next prime
        while not prime:
            container, size = expand(container, size, 2**16) # add 65536 cells and sieve the container
            prime = next((j for j in range(last_prime+1, size) if container & 1 << j), None)
        yield prime
    last_prime = prime

How to expand the container? Just add a bunch of 1s at the left of the container (in binary format) and sieve them. This is identical to the standard sieve, with a slight difference. In the standard sieve, if we find a prime i, we start to cross the cells at i*i, with a step of i.

Here, this may have been done for the first part of container. We just need to start at the beginnig of the new part of the container if it is farther than i*i.

def expand(container, size, n):
    new_size = size + n
    container += (1 << (new_size + 1) - 1) - (1 << size) # add n 1's
    for i in range(2, new_size):
        if container & (1 << i): # i is a prime
            t = sum(1 << j for j in range(max(i, size // i)*i, new_size, i)) # set 1 for all mutiple
            container &= ~t # cross the cells

    return container, new_size

Test for a million primes:

import itertools
assert 78498 == len(list(itertools.takewhile(lambda p: p<1000000, primes())))