1

I found this highly optimised implementation of the Sieve of Eratosthenes for Python on Stack Overflow. I have a rough idea of what it's doing but I must admit the details of it's workings elude me.

I would still like to use it for a little project (I'm aware there are libraries to do this but I would like to use this function).

Here's the original:

'''
    Sieve of Eratosthenes 
    Implementation by Robert William Hanks      
    https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n/3035188
'''

def sieve(n):
    """Return an array of the primes below n."""
    prime = numpy.ones(n//3 + (n%6==2), dtype=numpy.bool)
    for i in range(3, int(n**.5) + 1, 3):
        if prime[i // 3]:
            p = (i + 1) | 1
            prime[       p*p//3     ::2*p] = False
            prime[p*(p-2*(i&1)+4)//3::2*p] = False
    result = (3 * prime.nonzero()[0] + 1) | 1
    result[0] = 3
    return numpy.r_[2,result]

What I'm trying to achieve is to modify it to return all primes below n starting at x so that:

primes = sieve(50, 100)

would return primes between 50 and 100. This seemed easy enough, I tried replacing these two lines:

def sieve(x, n):
    ...
    for i in range(x, int(n**.5) + 1, 3):
    ...

But for a reason I can't explain, the value of x in the above has no influence on the numpy array returned!

How can I modify sieve() to only return primes between x and n

Gareth Rees
  • 60,601
  • 9
  • 119
  • 155
Juicy
  • 9,942
  • 32
  • 97
  • 181

2 Answers2

7

The implementation you've borrowed is able to start at 3 because it replaces sieving out the multiples of 2 by just skipping all even numbers; that's what the 2*… that appear multiple times in the code are about. The fact that 3 is the next prime is also hardcoded in all over the place, but let's ignore that for the moment, because if you can't get past the special-casing of 2, the special-casing of 3 doesn't matter.

Skipping even numbers is a special case of a "wheel". You can skip sieving multiples of 2 by always incrementing by 2; you can skip sieving multiples of 2 and 3 by alternately incrementing by 2 and 4; you can skip sieving multiples of 2, 3, 5, and 7 by alternately incrementing by 2, 4, 2, 4, 6, 2, 6, … (there's 48 numbers in the sequence), and so on. So, you could extend this code by first finding all the primes up to x, then building a wheel, then using that wheel to find all the primes between x and n.

But that's adding a lot of complexity. And once you get too far beyond 7, the cost (both in time, and in space for storing the wheel) swamps the savings. And if your whole goal is not to find the primes before x, finding the primes before x so you don't have to find them seems kind of silly. :)

The simpler thing to do is just find all the primes up to n, and throw out the ones below x. Which you can do with a trivial change at the end:

primes = numpy.r_[2,result]
return primes[primes>=x]

Or course there are ways to do this without wasting storage for those initial primes you're going to throw away. They'd be a bit complicated to work into this algorithm (you'd probably want to build the array in sections, then drop each section that's entirely < x as you go, then stack all the remaining sections); it would be far easier to use a different implementation of the algorithm that isn't designed for speed and simplicity over space…

And of course there are different prime-finding algorithms that don't require enumerating all the primes up to x in the first place. But if you want to use this implementation of this algorithm, that doesn't matter.

abarnert
  • 313,628
  • 35
  • 508
  • 596
  • Thanks for the excellent explanation. So I won't be saving any time by only fetching the primes above X. I may look into other algorithms or other implementations! – Juicy Oct 14 '14 at 01:39
  • @Juicy: Exactly; this particular algorithm is all about using the knowledge of all primes below N (or, rather, the largest multiple of each prime up to N) to efficiently determine whether or not N is prime, so even if you don't want those primes, you have to fetch them. – abarnert Oct 14 '14 at 18:35
  • no, no, what you need is the [offset sieve of Eratosthenes](http://stackoverflow.com/a/19641049/849891) - sieve up to the `sqrt` of `n`, while sieving an offset segment from `x` to `n` too. A ***lot*** less work than what you're proposing, when `sqrt(n)` << `x`. – Will Ness Oct 15 '14 at 11:58
  • @WillNess: The question is asking how to find the primes between X and Y with this particular implementation of this particular algorithm. Which you can't, except by finding them all and dropping the ones below X. The answer also says that the right solution is to pick one of the many different prime-finding algorithms that _can_ find the primes between X and Y, and the OP commented that he will look into other algorithms or implementations. – abarnert Oct 15 '14 at 17:32
  • this particular implementation of this particular algorithm is perfectly amenable to the change described in the answer linked in my comment. It already works up to the sqrt of n, all that's needed is to split the sieve array in two. No need to use anything *but* the sieve of Eratosthenes for this. It absolutely does *not* require enumerating all the primes up to `x` to start producing the primes after `x`. – Will Ness Oct 15 '14 at 23:48
  • @WillNess: Either `x <= sqrt(n)`, in which case enumerating all primes up to `sqrt(n)` includes all primes up to `x`, or `x > sqrt(n)`, in which case enumerating all primes up to `sqrt(n)` and discarding the ones below `x` is the exact thing I suggested. (It's doing the sieving and recording in a single array of trues from 0 to n by just marking false as it goes; if you change it to use separate storage for the sieve and the results, then that's just "ways to do this without wasting storage for those initial primes you're going to throw away".) – abarnert Oct 16 '14 at 00:10
  • you suggested *"to build the array in sections, then drop each section that's entirely < x as you go"* but you don't need to *"drop each"* if you just don't make all of the intermediate segments in the first place, and only work with **two**. Then at the very end of the answer you propose using "different prime-finding algorithms that don't require enumerating all the primes up to `x`" but the sieve of Eratosthenes requires no such thing so there's no need to replace it with anything else, in that regard. – Will Ness Oct 16 '14 at 09:43
  • @WillNess: How are you going to avoid "making the intermediate segments" when using an implementation that uses a single array for both the primes and the sieved composites? Are you suggesting a sparse array or something? If you start by constructed an array of `n` flags (or, in this case, `n//3`), there's no way to avoid having an array of `n` values. – abarnert Oct 16 '14 at 18:06
  • @abarnert that's what I'm saying: not the single array, but two arrays. the pseudocode in the answer that I linked to in my first comment is quite clear. – Will Ness Oct 16 '14 at 19:07
  • @WillNess: And your answer is a different implementation of a variation of the same algorithm, so I don't see why you think that fits the "using this specific function" requirement the OP asked for. – abarnert Oct 16 '14 at 19:31
  • you're splitting hairs here. The OP *asked* for the original impl to be modified into returning primes from x to n. The proper modification for case x > sqrt(n) is to split the array in two. Have you looked at the pseudocode? – Will Ness Oct 16 '14 at 19:32
3

Since you're now interested in looking into other algorithms or other implementations, try this one. It doesn't use numpy, but it is rather fast. I've tried a few variations on this theme, including using sets, and pre-computing a table of low primes, but they were all slower than this one.

#! /usr/bin/env python

''' Prime range sieve.

    Written by PM 2Ring 2014.10.15

    For range(0, 30000000) this is actually _faster_ than the 
    plain Eratosthenes sieve in sieve3.py !!!
'''

import sys

def potential_primes():
    ''' Make a generator for 2, 3, 5, & thence all numbers coprime to 30 '''
    s = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
    for i in s:
        yield i
    s = (1,) + s[3:] 
    j = 30
    while True:
        for i in s:
            yield j + i
        j += 30


def range_sieve(lo, hi):
    ''' Create a list of all primes in the range(lo, hi) '''

    #Mark all numbers as prime
    primes = [True] * (hi - lo)

    #Eliminate 0 and 1, if necessary
    for i in range(lo, min(2, hi)):
        primes[i - lo] = False

    ihi = int(hi ** 0.5)
    for i in potential_primes():
        if i > ihi: 
            break

        #Find first multiple of i: i >= i*i and i >= lo
        ilo = max(i, 1 + (lo - 1) // i ) * i

        #Determine how many multiples of i >= ilo are in range
        n = 1 + (hi - ilo - 1) // i

        #Mark them as composite
        primes[ilo - lo : : i] = n * [False]

    return [i for i,v in enumerate(primes, lo) if v]


def main():
    lo = int(sys.argv[1]) if len(sys.argv) > 1 else 0
    hi = int(sys.argv[2]) if len(sys.argv) > 2 else lo + 30
    #print lo, hi

    primes = range_sieve(lo, hi)
    #print len(primes)
    print primes
    #print primes[:10], primes[-10:]


if __name__ == '__main__':
    main()

And here's a link to the plain Eratosthenes sieve that I mentioned in the docstring, in case you want to compare this program to that one.

You could improve this slightly by getting rid of the loop under #Eliminate 0 and 1, if necessary. And I guess it might be slightly faster if you avoided looking at even numbers; it'd certainly use less memory. But then you'd have to handle the cases when 2 was inside the range, and I figure that the less tests you have the faster this thing will run.


Here's a minor improvement to that code: replace

    #Mark all numbers as prime
    primes = [True] * (hi - lo)

    #Eliminate 0 and 1, if necessary
    for i in range(lo, min(2, hi)):
        primes[i - lo] = False

with

    #Eliminate 0 and 1, if necessary
    lo = max(2, lo)

    #Mark all numbers as prime
    primes = [True] * (hi - lo)

However, the original form may be preferable if you want to return the plain bool list rather than performing the enumerate to build a list of integers: the bool list is more useful for testing if a given number is prime; OTOH, the enumerate could be used to build a set rather than a list.

Community
  • 1
  • 1
PM 2Ring
  • 50,023
  • 5
  • 64
  • 150