16

I am trying to optimize further the champion solution in prime number thread by taking out the complex formula for sub-list length. len() of the same subsequence is too slow as len is expensive and generating the subsequence is expensive. This looks to slightly speed up the function but I could not yet take away the division, though I do the division only inside the condition statement. Of course I could try to simplify the length calculation by taking out the optimization of starting marking for n instead of n*n...

I replaced division / by integer division // to be compatible with Python 3 or

from __future__ import division

Also I would be interesting if this recurrence formula could help to speed up the numpy solution, but I have not experience of using numpy much.

If you enable psyco for the code, the story becomes completely different, however and Atkins sieve code becomes faster than this special slicing technique.

import cProfile

def rwh_primes1(n):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    """ Returns  a list of primes < n """
    sieve = [True] * (n//2)
    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = [False] * ((n-i*i-1)//(2*i)+1)
    return [2] + [2*i+1 for i in xrange(1,n/2) if sieve[i]]

def primes(n):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    # recurrence formula for length by amount1 and amount2 Tony Veijalainen 2010
    """ Returns  a list of primes < n """
    sieve = [True] * (n//2)
    amount1 = n-10
    amount2 = 6

    for i in xrange(3,int(n**0.5)+1,2):
        if sieve[i//2]:
             ## can you make recurrence formula for whole reciprocal?
            sieve[i*i//2::i] = [False] * (amount1//amount2+1)
        amount1-=4*i+4
        amount2+=4

    return [2] + [2*i+1 for i in xrange(1,n//2) if sieve[i]]

numprimes=1000000
print('Profiling')
cProfile.Profile.bias = 4e-6
for test in (rwh_primes1, primes):
    cProfile.run("test(numprimes)")

Profiling (not so much difference between versions)

         3 function calls in 0.191 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.191    0.191 <string>:1(<module>)
        1    0.185    0.185    0.185    0.185 myprimes.py:3(rwh_primes1)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         3 function calls in 0.192 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.192    0.192 <string>:1(<module>)
        1    0.186    0.186    0.186    0.186 myprimes.py:12(primes)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Interestingly by increasing the limit to 10**8 and putting timing decorator to functions removing the profiling:

rwh_primes1 took 23.670 s
primes took 22.792 s
primesieve took 10.850 s

Interestingly if you do not produce list of primes but return the sieve itself the time is around half from number list version.

Tony Veijalainen
  • 4,921
  • 20
  • 30
  • 1
    just saw your recurrence improvement today, nice ideia if i have time i will pursue a variation of it, did you saw the code for primes2 ? (a pure python version of my fastest numpy solution) http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/ – Robert William Hanks Aug 29 '10 at 06:42
  • 1
    I love python, but if you want to improve speed, rewrite it in C – Ruggero Turra Feb 11 '11 at 23:05
  • 5
    Did you profile your code? I yes, please post the results. If not, how can you know what to optimize? – 9000 Feb 12 '11 at 14:11
  • 1
    Rewriting key components in C is effectively what the numpy-based solutions do - move the raw arithmetic out of Python (which is relatively slow) and into C (which is about as fast as you can get without dropping down into actual assembly code). As far as this specific example code goes, saving the result of the "i//2" calculation to a local variable would be an obvious micro-optimisation. – ncoghlan Feb 18 '11 at 07:10
  • 1
    Another micro-optimization would be to observe that `4*i+4 == 4*(i+i)`, and so by moving the subtraction from `amount1` to the start of the loop, it could become `amount1-=4*i`. Of course you'll also need to change the initial value of `amount1` to `n+2`. – jchl Mar 09 '11 at 00:11
  • jchl: Some typo as `4*(i+i) == 8*i`, not what you are saying. `4*(i+1)` it is, but I could not get your suggestion to function. ncoghlan: `i*i//2` is not same as `i*(i//2)` but `(i*i)//2`. – Tony Veijalainen Mar 13 '11 at 23:17
  • See also http://stackoverflow.com/questions/2897297/speed-up-bitstring-bit-operations-in-python – jfs Mar 14 '11 at 07:23
  • Moving from lists to bytearrays trims about 15% from the total time, for me (MacBook, python2.6). In terms of simplifying the recurrence, I was wondering about making the sieve of length n and then putting a slightly higher limit on the slice, so you get `sieve[i*i//2:n//2+j:i] = ...` and choose j to be small, easy to compute, and such that it simplifies the rhs recurrence. –  Apr 12 '11 at 20:54
  • ... actually, I guess only j//i needs to be small. –  Apr 12 '11 at 21:00
  • Did you try your code with PyPy? This seems exactly like the kind of things pypy should be good at. – static_rtti Apr 13 '11 at 09:43
  • I mentioned about psyco that for it the story is very much different and this code is not fastest but the more primitive versions are faster than slicing utilizing versions. OK, I run the comparison, but should take at least Sieve of Atkins in. – Tony Veijalainen Apr 13 '11 at 23:00

2 Answers2

1

You could do a wheel optimisation. Multiples of 2 and 3 aren't primes, so dont store them at all. You could then start from 5 and skip multiples of 2 and 3 by incrementing in steps of 2,4,2,4,2,4 etc..

Below is a C++ code for it. Hope this helps.

void sieve23()
{
    int lim=sqrt(MAX);
    for(int i=5,bit1=0;i<=lim;i+=(bit1?4:2),bit1^=1)
    {
        if(!isComp[i/3])
        {
            for(int j=i,bit2=1;;)
            {
                j+=(bit2?4*i:2*i);
                bit2=!bit2;
                if(j>=MAX)break;
                isComp[j/3]=1;
            }
        }
    }
}
jack_carver
  • 1,450
  • 2
  • 12
  • 25
  • in one of the programming pearl papers, [The Genuine Sieve of Eratosthenes](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf), this is described for a wheel of 2,3,5,7. – Dan D. Apr 13 '11 at 09:20
0

If you may decide you are going to C++ to improve the speed, I ported the Python sieve to C++. The full discussion can be found here: Porting optimized Sieve of Eratosthenes from Python to C++.

On Intel Q6600, Ubuntu 10.10, compiled with g++ -O3 and with N=100000000 this takes 415 ms.

#include <vector>
#include <boost/dynamic_bitset.hpp>

// http://vault.embedded.com/98/9802fe2.htm - integer square root
unsigned short isqrt(unsigned long a) {
    unsigned long rem = 0;
    unsigned long root = 0;

    for (short i = 0; i < 16; i++) {
        root <<= 1;
        rem = ((rem << 2) + (a >> 30));
        a <<= 2;
        root++;

        if (root <= rem) {
            rem -= root;
            root++;
        } else root--;

    }

    return static_cast<unsigned short> (root >> 1);
}

// https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
// https://stackoverflow.com/questions/5293238/porting-optimized-sieve-of-eratosthenes-from-python-to-c/5293492
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
    T i, j, k, sievemax, sievemaxroot;

    sievemax = N/3;
    if ((N % 6) == 2) sievemax++;

    sievemaxroot = isqrt(N)/3;

    boost::dynamic_bitset<> sieve(sievemax);
    sieve.set();
    sieve[0] = 0;

    for (i = 0; i <= sievemaxroot; i++) {
        if (sieve[i]) {
            k = (3*i + 1) | 1;
            for (j = k*k/3; j < sievemax; j += 2*k) sieve[j] = 0;
            for (j = (k*k+4*k-2*k*(i&1))/3; j < sievemax; j += 2*k) sieve[j] = 0;
        }
    }

    primes.push_back(2);
    primes.push_back(3);

    for (i = 0; i < sievemax; i++) {
        if (sieve[i]) primes.push_back((3*i+1)|1);
    }

}
Community
  • 1
  • 1
orlp
  • 98,226
  • 29
  • 187
  • 285
  • Maybe little information about the boost library to get it actually to interface with Python? I would like to test it but I am not yet experienced the Boost library. – Tony Veijalainen Apr 13 '11 at 23:12
  • Well, you don't need the Boost library for Python. In fact, you don't even have to link against it! You just compile this file, which includes `boost/dynamic_bitset.hpp` and you're done (you must have boost-dev installed on the compiling system though). No DLL's, no linking, nothing. – orlp Apr 13 '11 at 23:22