4

I was solving some problems on project euler and I wrote identical functions for problem 10...

The thing that amazes me is that the C solution runs in about 4 seconds while the python solution takes about 283 seconds. I am struggling to explain to myself why the C implementation is so much faster than the python implementation, what is actually happening to make it so?

C:

#include <stdio.h>
#include <time.h>
#include <math.h>

int is_prime(int num)
{
    int sqrtDiv = lround(sqrt(num));
    while (sqrtDiv > 1) {
        if (num % sqrtDiv == 0) {
            return(0);
        } else {
            sqrtDiv--;
        }
    }
    return(1);
}

int main () 
{
    clock_t start = clock();

    long sum = 0;
    for ( int i = 2; i < 2000000; i++ ) {
        if (is_prime(i)) {
            sum += i;
        }
    }
    printf("Sum of primes below 2,000,000 is: %ld\n", sum);

    clock_t end = clock();
    double time_elapsed_in_seconds = (end - start)/(double)CLOCKS_PER_SEC;
    printf("Finished in %f seconds.\n", time_elapsed_in_seconds);   
}

Python:

from math import sqrt
import time


def is_prime(num):
    div = round(sqrt(num))
    while div > 1:
        if num % div == 0:
            return False
        div -= 1
    return True

start_time = time.clock()

tsum = 0
for i in range(2, 2000000):
    if is_prime(i):
        tsum += i

print tsum
print('finished in:', time.clock() - start_time, 'seconds')
lostAtSeaJoshua
  • 1,575
  • 1
  • 20
  • 31
deltaskelta
  • 7,789
  • 10
  • 41
  • 86
  • 10
    If you're using Python 2.7, `range(2, 2000000)` actually builds an in-memory list of about 2000000 integers. You aren't doing the same equivalent in C. Try `xrange()` instead, or switch to Python 3, where `range()` is a lazy iterator. – Akshat Mahajan Jul 19 '16 at 00:20
  • Static type declarations and possibly using a memory-inefficient iterator vs. a generator in python – Dan Jul 19 '16 at 00:22
  • 1
    `div` is float in your python code, but `sqrtDiv` is int in your C code. – Paul Hankin Jul 19 '16 at 00:22
  • `round(sqrt(num)) -> int(sqrt(num) + 1)` gives a 2.5x speed increase. I don't think range vs xrange makes any difference in this case. – Paul Hankin Jul 19 '16 at 00:28
  • @PaulHankin After testing an `xrange()` version, I am forced to agree. – Akshat Mahajan Jul 19 '16 at 00:29
  • Could [this answer](http://stackoverflow.com/a/3033379/4520911) be relevant? – iRove Jul 19 '16 at 00:32
  • 2
    Fixing the int/float error I mentioned above leaves the python version 21 times slower than the C. That's about in the right ballpark for code that's doing lots of sums on small ints. – Paul Hankin Jul 19 '16 at 00:35
  • `int(round(sqrt(num)))` is the correct replacement, not what I wrote above. – Paul Hankin Jul 19 '16 at 00:38
  • you should really use numpy for numerical code in python. Also why would you expect python to be as fast as C? – s952163 Jul 19 '16 at 00:38
  • @PaulHankin Actually, doing `int(sqrt(num))` is sufficient, as `int` will always end up effectively rounding down, which is what is wanted for primality testing. – Akshat Mahajan Jul 19 '16 at 00:44
  • @AkshatMahajan I don't think it's guaranteed that `math.sqrt(x)` returns an exact int when the input is a square, but perhaps it is. If it's not, then `int(sqrt(p*p))` could evaluate to `p-1`, and cause `p*p` to be identified as prime. – Paul Hankin Jul 19 '16 at 00:53
  • @PaulHankin - Python says " It provides access to the mathematical functions defined by the C standard." As long as the underlying C library conforms to IEEE-754 (which it should), and the output is representable in a double it will. – TLW Jul 19 '16 at 03:43
  • @PaulHankin - Note that even `int(round(sqrt(num)))` can and will break on large numbers. E.g. `int(round(sqrt((10**200))))`. – TLW Jul 19 '16 at 03:47
  • You really should be using numpy arrays instead of lists (better batch performance) and a sieve more like these: http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n?noredirect=1&lq=1 – HAL 9001 Jul 19 '16 at 04:47

1 Answers1

2

It's CPython (the implementation) that's slow in this case, not Python necessarily. CPython needs to interpret the bytecode which will almost always be slower than compiled C code. It simply does more work than the equivalent C code. In theory each call to sqrt for example requires looking up that function, rather than just a call to a known address.

If you want comparable speed from Python, you could either annotate the source with types and compile with Cython, or try running with Pypy for some JIT performance.

viraptor
  • 30,857
  • 7
  • 96
  • 176