Could someone help me understand this Sieve of Eratosthenes script? The last couple lines have me stumped

Question

All right, so I get the principle behind the Sieve of Eratosthenes. I tried, and largely failed, to write one myself (I wrote a functional prime number generator, it just wasn't efficient or a sieve). I think I'm more having an issue understanding the math, but the programming has me mixed up too.

import numpy as np

def primesfrom2to(n):
    sieve = np.ones(n/3 + (n%6==2), dtype=np.bool) 
    # print(sieve) for n = 10 returns [True, True, True]

Ok, write off the bat I'm a little confused. It's generating a list of truth values that will progressively be marked false as they are determined to be composite. I think it's saying no more than 1/3 of the values less than n will be prime, but what does adding a modolus operation achieve?

    sieve[0] = False

marks 1 as false?

    for i in range(int(n**0.5)//3+1):
        # print(i) for n = 10 returns 0 and 1 on separate lines

This is setting the range to check. No number has a product greater than its square root. Whats up with the dividing by 3+1?

        if sieve[i]:
            k=3*i+1|1
            #print(k) for n = 10 returns '5' doesn't change till it adds 7 at n = 50

Ok so, if sieve[i]is true (so prime / not yet marked as composite?) then 3*i + 1 or 1? How does this work exactly? It seems to be generating primes that it will be multiplied later to remove all the subsequent products, but I don't know how.

            sieve[      ((k*k)/3)      ::2*k] = False
            sieve[(k*k+4*k-2*k*(i&1))/3::2*k] = False

ok so both of these are taking the primes k and marking all further multiples of them? if n=5 wouldn't that make it sieve[8.33::10] = False? And the other one like sieve[41.3::10]? I just don't get it.

print(np.r_[2,3,((3*np.nonzero(sieve)[0]+1)|1)])

All right, and finally actually generating the list of primes. Again, what's up with multiplying it by three? Obviously there is something fundamental about 3 in this code that I just don't get. Also I'm failing to grasp the |1 concept again.

Oh and just for fun here's my effective and horribly inefficient attempt.

import numpy
def sieve(num):

    master = numpy.array([i for i in range(2, num+1)])
    run = []
    y=2


    while y <= ((num+1)**(1/2)): 
        thing = [x for x in range(y, num+1, y) if x > 5 or x == 4]
        run = run + thing
        y = y+1 
    alist = numpy.in1d(master, run, invert = True)
    blist = (master[alist])
    print(blist)

It took 57s to calculate the primes up to 500,000. I was doing the Euler sum of primes up to 2,000,000 problem.

score 3 · Accepted Answer · edited May 23 '17 at 10:28

Here is a simple sieve algorithm in numpy:

import numpy as np
def sieve(Nmax):
    """Calculate prime numbers between 0 and Nmax-1."""
    is_prime = np.ones(Nmax, dtype=bool)
    Pmax = int(np.sqrt(Nmax)+1)
    is_prime[0] = is_prime[1] = False
    p = 2
    dp = 1
    while p <= Pmax:
        if is_prime[p]:
            is_prime[p*2::p] = False
        p += dp
        dp = 2
    primes = np.argwhere(is_prime)[:,0]
    print(primes)

sieve(100)

If I remove the print statement, it runs for Nmax=10^8 in 2.5 seconds on my modest laptop.

This algorithm is a bit inefficient because it needs to store the prime-ness of even values and multiples of 3. You can save on storage space by filtering out those, so that you keep track of the prime-ness of n*6+1 and n*6+5, for any integer n>=1. That saves you two thirds of the storage space, at the expense of a bit more book-keeping. It seems that you attempted to start with the difficult version. Moreover, all the bookkeeping tends to become expensive if it involves the Python interpreter evaluating if statements and doing the memory management of your lists. Python/numpy's array slicing is a much more natural way to do it, and it's probably fast enough for your purposes.

Regarding your questions:

(n%6==2) is boolean, will be interpreted as 0 or 1 and will add one more element to prevent an "off-by-one" error.
int(n**0.5)//3+1 should be read as int(int(n**0.5)/3) + 1. Division goes before addition. The division by three is so that you only allocate space for 6n+1 and 6n+5 values.
k=3*i+1|1 means k=3*i+1, add one if it is even (it is a bit-wise or). Array element 'i' corresponds to potential prime number (3*i+1)|1. So if the array indices are [0, 1, 2, 3, 4, 5, ...], they represent the numbers [1, 5, 7, 11, 13, 17, ...].
Setting the sieve elements to False is done separately for the elements representing 6n+1 and 6n+5, that's why there are two assignments.
If you're running this in Python 2.7, integer division is always truncated, so 9/2 will result in 4. In Python 3, it would be better to use the // operator for integer divisions, i.e. (k*k)/3 should be (k*k)//3.

Edit: you may wish to attribute the algorithm that you were asking about: Fastest way to list all primes below N.

Thank you! That makes a lot more sense. The bitwise or had me especially confused. Expressing it as 6n+1 and 6n+5 made it click. I had learned about proving there are infinite primes in logic and these were definitely alluded to. Oh, and that link is where I got this code from! — Josh, May 05 '16 at 23:46

Could someone help me understand this Sieve of Eratosthenes script? The last couple lines have me stumped

1 Answers1