7

This very short and simple code in #Python tries to simulate the "Sieve of Eratosthenes" for the first N natural numbers with the constraints of (0) script shortness; (1) minimization of the 'if statements' and 'for/while loops'; (2) efficiency in terms of CPU time.

import numpy as np
N = 10**5
a = np.array(range(3,N,2))
for j in range(0, int(round(np.sqrt(N),0))):
    a[(a!=a[j]) & (a%a[j] == 0)] = 0
    a = a[a!=0]
a = [2]+list(a)

On an Intel Core I5, it returns the prime numbers among the first:

  • N = 100,000 in 0.03 seconds;
  • N = 1,000,000 in 0.63 seconds;
  • N = 10,000,000 in 22.2 seconds.

Would someone like to share more efficient codes in term of CPU time within the aforementioned constraints?

Rob
  • 125
  • 1
  • 6
  • 3
    Like so many people before you, you've attempted to write a sieve and ended up with trial division. – user2357112 supports Monica Apr 20 '18 at 07:30
  • Sorry, are there some mistakes that I do not see? – Rob Apr 20 '18 at 07:31
  • Even this relatively naive [implementation](https://codereview.stackexchange.com/a/174089) tests a lot faster than yours. – juanpa.arrivillaga Apr 20 '18 at 07:31
  • Well, I am not challenging anyone. Anyway, on the same machine the code you are proposing (in the version "alternative implementation") has this performance: N = 1E6 in 0.37s and N = 1E7 in 4.4 seconds. – Rob Apr 20 '18 at 07:39
  • 1
    @Roberto the idea of a sieve is *to avoid divisions*, which your code does not. (A real sieve could have even shorter code without any comparisons.) – kazemakase Apr 20 '18 at 07:50
  • 1
    Yeah, I just wanted to point out that `numpy` isn't a great idea here for performance. And as kazemakase points out, this isn't really The Sieve, since you iteratively check the modulus. Furthermore, `(a!=a[j]) & (a%a[j] == 0)` is a pretty wasteful operation if you really want to squeeze performance out of it. – juanpa.arrivillaga Apr 20 '18 at 07:51
  • Ok. Thank you. I have tried for fun to write some short code for the problem, where can I find some short and elegant codes that simulates a real sieve? – Rob Apr 20 '18 at 07:53
  • I would consider the one I linked to short and elegant. – juanpa.arrivillaga Apr 20 '18 at 07:57
  • 2
    There are a lot of NumPy examples with benchmarks here: https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n/2068548 – Alex Riley Apr 20 '18 at 08:14
  • 1
    See in particular the answers here: https://stackoverflow.com/a/3035188/3923281 – Alex Riley Apr 20 '18 at 08:17
  • @AlexRiley: Not everything there is a sieve of Eratosthenes, though. There are wheel sieves and other algorithms, and for someone unfamiliar with the difference between trial division and the basic sieve of Eratosthenes, it may be hard to tell what's going on in all that code. – user2357112 supports Monica Apr 20 '18 at 08:25

4 Answers4

14

An actual NumPy sieve of Eratosthenes looks like this:

def sieve(n):
    flags = numpy.ones(n, dtype=bool)
    flags[0] = flags[1] = False
    for i in range(2, n):
        # We could use a lower upper bound for this loop, but I don't want to bother with
        # getting the rounding right on the sqrt handling.
        if flags[i]:
            flags[i*i::i] = False
    return numpy.flatnonzero(flags)

It maintains an array of "possibly prime" flags and directly unsets the flags corresponding to multiples of primes, without needing to test divisibility, especially for numbers that aren't divisible by the prime currently being handled.

What you're doing is trial division, where you just go through and test whether numbers are divisible by candidate divisors. Even a good implementation of trial division needs to do more operations, and more expensive operations, than a sieve. Your implementation does even more work than that, because it considers non-prime candidate divisors, and because it keeps performing divisibility tests for numbers it should already know are prime.

user2357112 supports Monica
  • 215,440
  • 22
  • 321
  • 400
2

I decided to play with this and created a further optimized NumPy version posted by @user2357112 supports Monica that uses the Numba JIT to speed things up.

import numba
import numpy
import timeit
import datetime 

@numba.jit(nopython = True, parallel = True, fastmath = True, forceobj = False)
def sieve (n: int) -> numpy.ndarray:
    primes = numpy.full(n, True)
    primes[0], primes[1] = False, False
    for i in numba.prange(2, int(numpy.sqrt(n) + 1)):
        if primes[i]:
            primes[i*i::i] = False
    return numpy.flatnonzero(primes)

if __name__ == "__main__":
    
    timestart = timeit.default_timer()
    print(sieve(1000000000))
    timestop = timeit.default_timer()
    timedelta = (timestop - timestart)
    print(f"Time Elapsed: {datetime.timedelta(seconds = timedelta)}")

else:
    pass

On my laptop, I sieve out the primes in the first 1 billion (1e9) natural numbers in 0:00:10.378686 seconds. The JIT provides at least an order of magnitude of performance here; the next fastest answer at the time of writing took 0:01:27.059963 minutes. Sadly, I don't have an Nvidia GPU and Cuda on this system (Mac), otherwise I'd have used that.

Greenstick
  • 6,744
  • 1
  • 18
  • 26
1

1.94 seconds for 10.000.000

def sieve_eratosthene(limit):

    primes = [True] * (limit+1)
    iter = 0

    while iter < limit**0.5 :
        if iter < 2:
            primes[iter]= False

        elif primes[iter]:
            for i in range(iter*2, limit+1, iter):
                primes[i] = False

        iter+=1

    return(x for x in range(number+1) if primes[x])
andreis11
  • 1,093
  • 1
  • 6
  • 10
1

Here's a simple way to do this using NumPy. It is somewhat similar to the OP's idea of indexing instead of looping again inside the main loop, but without the division check, only slicing.

It is also similar to user2357112 supports Monica answer, but this one only considers odd numbers, which makes it faster.

This is a typical odd only sieve: https://stackoverflow.com/a/20248491/8094047.

  1. Create a bool array of size n.
  2. Loop through odd numbers up to square root of n.
    Note: numbers are used interchangeably as indices
  3. Mark composite (multiples of primes) numbers as false.

In the end we have a bool array that we can use to check if a number below n is prime (except even numbers, you can check for them without the array, using & 1 or so)

Avg. time for n = 20000000: 0.1063s

import numpy as np
n = 20000000

isprime = np.ones(n, dtype=np.bool)

# odd only sieve
i = 3
while (i * i < n):
    if isprime[i]:
        isprime[i * i:n:2 * i] = False
    i += 2

# test
print(isprime[97]) # should be True
Tonechas
  • 11,520
  • 14
  • 37
  • 68
Iyad Ahmed
  • 137
  • 2
  • 7