2

I'm attempting to implement the Sieve of Eratosthenes. The output seems to be correct (minus "2" that needs to be added) but if the input to the function is larger than 100k or so it seems to take an inordinate amount of time. What are ways that I can optimize this function?

def sieveErato(n):
     numberList = range(3,n,2)

     for item in range(int(math.sqrt(len(numberList)))):
            divisor = numberList[item]
            for thing in numberList:
                    if(thing % divisor == 0) and thing != divisor:
                            numberList.remove(thing)
    return numberList
Jason Sundram
  • 10,998
  • 19
  • 67
  • 84
pri0ritize
  • 503
  • 2
  • 6
  • 17
  • possible duplicate of [Sieve of Eratosthenes - Finding Primes Python](http://stackoverflow.com/questions/3939660/sieve-of-eratosthenes-finding-primes-python) – MAK Jan 28 '12 at 07:59
  • we could start by plotting the resolution time as a function of n, this could give us some ideas... – jimifiki Jan 28 '12 at 08:04
  • 1
    Check out http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python – Jason Sundram Jan 28 '12 at 18:21
  • Thanks Jason, I'd read that post a couple days back and had a hard time working through the Eratosthenes function given. I think I understand what I was missing now. – pri0ritize Jan 28 '12 at 19:49

6 Answers6

1

Your algorithm is not the Sieve of Eratosthenes. You perform trial division (the modulus operator) instead of crossing-off multiples, as Eratosthenes did over two thousand years ago. Here is an explanation of the true sieving algorithm, and shown below is my simple, straight forward implementation, which returns a list of primes not exceeding n:

def sieve(n):
    m = (n-1) // 2
    b = [True]*m
    i,p,ps = 0,3,[2]
    while p*p < n:
        if b[i]:
            ps.append(p)
            j = 2*i*i + 6*i + 3
            while j < m:
                b[j] = False
                j = j + 2*i + 3
        i+=1; p+=2
    while i < m:
        if b[i]:
            ps.append(p)
        i+=1; p+=2
    return ps

We sieve only on the odd numbers, stopping at the square root of n. The odd-looking calculations on j map between the integers being sieved 3, 5, 7, 9, ... and indexes 0, 1, 2, 3, ... in the b array of bits.

You can see this function in action at http://ideone.com/YTaMB, where it computes the primes to a million in less than a second.

user448810
  • 16,364
  • 2
  • 31
  • 53
  • 1
    *simple* implementation should look like [this](http://stackoverflow.com/a/193605/4279). – jfs Jan 29 '12 at 08:24
0

You can try the same way Eratosthenes did. Take an array with all numbers you need to check order ascending, go to number 2 and mark it. Now scratch every second number till the end of the array. Then go to 3 and mark it. After that scratch every third number . Then go to 4 - it is already scratched, so skip it. Repeat this for every n+1 which is not already scratched.

In the end, the marked numbers are the prime one. This algorithm is faster, but sometimes need lots of memory. You can optimize it a little by drop all even numbers (cause they are not prime) and add 2 manually to the list. This will twist the logic a little, but will take half the memory.

Here is an illustration of what I'm talking about: http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

anthares
  • 10,337
  • 3
  • 39
  • 58
  • if you follow that process, you'll check off each number more than once. it is possible to only need to check them off once. see my answer. – Dan D. Jan 28 '12 at 10:17
0

Warning: removing elements from an iterator while iterating on it can be dengerous...

You could make the

    if(thing % divisor == 0) and thing != divisor:

test lighter by splitting it in the loop that breaks when you arrive to the index of 'divisor' and then the test:

for thing in numberList_fromDivisorOn:
    if(thing % divisor == 0):
        numberList.remove(thing)
jimifiki
  • 4,779
  • 1
  • 28
  • 53
0

I followed this link: Sieve of Eratosthenes - Finding Primes Python as suggested by @MAK and I've found that the accepted answer could be improved with an idea I've found in your code:

def primes_sieve2(limit):
    a = [True] * limit               # Initialize the primality list
    a[0] = a[1] = False
    sqrt = int(math.sqrt(limit))+1
    for i in xrange(sqrt):
        isprime = a[i]
        if isprime:
            yield i
            for n in xrange(i*i, limit, i):     # Mark factors non-prime
                a[n] = False
    for (i, isprime) in enumerate(a[sqrt:]):
        if isprime:
            yield i+sqrt
Community
  • 1
  • 1
jimifiki
  • 4,779
  • 1
  • 28
  • 53
  • This returns [2, 3, 5, 7, 1, 3, 7, 9, 13, 19, 21, 27, 31, 33, 37, 43, 49, 51, 57, 61, 63, 69, 73, 79, 87] for limit=100. Not only is there a strange duplication, but many of these terms aren't prime. – DSM Jan 28 '12 at 18:31
0

if given unlimited memory and time, the following code will print all the prime numbers. and it'll do it without using trial division. it is based on the haskell code in the paper: The Genuine Sieve of Eratosthenes by Melissa E. O'Neill

from heapq import heappush, heappop, heapreplace
def sieve():
    w = [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
    for p in [2,3,5,7]: print p
    n,o = 11,0
    t = []
    l = len(w)
    p = n
    heappush(t, (p*p, n,o,p))
    print p
    while True:
        n,o = n+w[o],(o+1)%l
        p = n
        if not t[0][0] <= p:
            heappush(t, (p*p, n,o,p))
            print p
            continue
        while t[0][0] <= p:
            _, b,c,d = t[0]
            b,c = b+w[c],(c+1)%l
            heapreplace(t, (b*d, b,c,d))
sieve()
Dan D.
  • 67,516
  • 13
  • 93
  • 109
  • Not only doesn't the code print 11, it seems to print many prime multiples of 11 [121, 143, 187, etc.] – DSM Jan 28 '12 at 19:05
  • Oops, i didn't see that, but the fix is simple. – Dan D. Jan 28 '12 at 20:15
  • What if you print out all primes up to 100 only? Did you use your heap at all? Its starting entry is at 121. It isn't needed until you check out 121, nor the following entries of all the primes below 100 which populate your heap when you hit 100 - but none was yet needed. This is a major flaw of the code from the article, which you have faithfully recreated here. An entry for a prime should be added into the heap only when the candidate is seen such that it is equal to that prime's square. The memory requirement of this code is thus O(n) to print n primes, instead of O(sqrt(n/log(n))). – Will Ness May 07 '12 at 19:21
  • *(contd.)* and since it has to manipulate such an enormously bigger heap than is actually needed, it will be slower too. To produce 1,000,000 primes you only need no more than 550 primes to check by; but your code will store 999,999 entries in the heap. Melissa O'Neill fixed that problem in her [code distribution](http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip) ZIP file. [Haskellwiki primes page](http://www.haskell.org/haskellwiki/Prime_numbers#Tree_merging_with_Wheel) also has its take on this. – Will Ness May 07 '12 at 19:32
0

This code takes 2 seconds to generate primes less than 10M (it is not mine, i found it somewer on google)

def erat_sieve(bound):
    if bound < 2:
        return []
    max_ndx = (bound - 1) // 2
    sieve = [True] * (max_ndx + 1)
    #loop up to square root
    for ndx in range(int(bound ** 0.5) // 2):
        # check for prime
        if sieve[ndx]:
            # unmark all odd multiples of the prime
            num = ndx * 2 + 3
            sieve[ndx+num:max_ndx:num] = [False] * ((max_ndx-ndx-num-1)//num + 1)
    # translate into numbers
    return [2] + [ndx * 2 + 3 for ndx in range(max_ndx) if sieve[ndx]]
Luka Rahne
  • 9,760
  • 3
  • 30
  • 55