0

I am coming from Java trying to learn Python. I implmented a Sieve of Eratosthenes algorithm in Java first, and then in Python. My Java implementation runs decently fast and I can find all primes under a billion in about 25 seconds. My Python implementation would probably take about 2 hours to do the same thing.

I have included both implementations here. My questions are:

  1. why is the Python implmentation so much slower? (I know I am doing something wrong)
  2. Is it possible for Python to do this as fast as Java?

I assume the slowness centers around using a list in the Python implemenation but I am too new to Python to know how to get around this.

JAVA:

/**
 * Creates a boolean array of a specified size with true values at prime indices and
 * false values at composite indices.
 */
private static boolean[] sieve(int size){
    boolean[] array = new boolean[size];

    //Assume all numbers greater than 1 are prime//
    for(int i = 2; i < array.length; i++){ 
        array[i] = true;
    }

    //Execute Sieve of Eratosthenes algorithm//
    for(int p = 2; p < size; p = nextPrimeInArray(array, p)){ 
        for(int i = p + p; i < size; i += p){
            array[i] = false; // i.e., mark as composite
        }
    }

    return array;
}

/**
 * Finds the next index in the array that is not marked composite
 */
public static int nextPrimeInArray(boolean[] array, int p){

    do{
        p++;
    }while(p < array.length && !array[p]);
    return p;
}

PYTHON:

def getPrimeList(limit):
    """returns a list of True/False values, where list[i] is True if i is prime and False otherwise"""
    primes = []

    # Initially assume all numbers in the list are prime
    for i in range(limit):
        primes.append(True)

    # Set 0 and 1 to False
    primes[0] = False
    primes[1] = False

    for p in range(2, limit):
        for i in range(p + p, limit, p):
            primes[i] = False
        p = nextPrimeInList(primes, p)
    return primes

def nextPrimeInList(list, p):
    """Helper method for getPrimeList that finds the next index in list not marked composite"""
    p += 1
    while p < len(list) and not list[p]:
        p += 1
    return p
  • Try running on PyPy and see if it gets faster – Mariusz Jamro Jan 04 '15 at 19:37
  • This is a pretty complex question and would take a long time to explain, but basically read up on interpretted vs compiled language runtimes and see why, in general, compilation is faster. – Kon Jan 04 '15 at 19:39
  • 1
    Is this Python 2? You're not taking advantage of Python's well known idioms here, for ex: `primes = [True]*limit` is going to be extremely fast. With that said, this question belongs to Codereview.SE IMO. – Ashwini Chaudhary Jan 04 '15 at 19:39
  • Java is extremly fast, thanks to JIT compiler. Long running code/loops get translated to native code and optimized. For testing try to disable Java's JIT with -Djava.compile=NONE. – PeterMmm Jan 04 '15 at 19:39
  • I'd suggest you to use NumPy for this because everything in Python is an object, so even integers, booleans take a lot of memory. For comparison [`primesfrom2to`](http://stackoverflow.com/a/3035188/846892) took just 7 seconds on my system compared to pure Python based [`rwh_primes2`](http://stackoverflow.com/a/2068548/846892)'s 27 seconds. – Ashwini Chaudhary Jan 04 '15 at 19:59

1 Answers1

1

I'm not an expert in Python but I will try to give you a decent answer.

First, Python is a scripting language, which makes it slower than any compiled language (such as Java). For example, many optimisations on loops cannot be performed and it can slower down your code for very large loops. Yet, I know there also exists a pre-compilation in certain implentations of Python and that what gets executed is actually bytecode like in Java, so maybe the difference is not that significant.

Then, I think you could speed up your Python version by allocating the right size for your list frome the beginning (I believe Python lists are actually arrays) :

 primes = [True] * limit
Dici
  • 22,025
  • 5
  • 34
  • 74