2

Why is my algorithm for finding the sum of all prime numbers below 2 million so slow? I'm a fairly beginner programmer and this is what I came up with for finding the solution:

import time

sum = 2
start = time.time()

for number in range(3, 2000000):
    prime = True
    for x in range(2, number):
        if number % x == 0:
            prime = False
    if prime:
        sum += number

print "Sum =", sum
end = time.time() - start
print "Runtime =", end

Can someone please help me out? Thanks!

  • Because you're looping through 2 million times, more than twice. Try first filtering it out so you only loop through the primes (hint, first start out with only odd numbers) – TerryA Jul 08 '13 at 10:47

9 Answers9

4

Your algorithm uses trial division, which is very slow. A better algorithm uses the Sieve of Eratosthenes:

def sumPrimes(n):
    sum, sieve = 0, [True] * n
    for p in range(2, n):
        if sieve[p]:
            sum += p
            for i in range(p*p, n, p):
                sieve[i] = False
    return sum

print sumPrimes(2000000)

That should run in less than a second. If you're interested in programming with prime numbers, I modestly recommend this essay at my blog.

user448810
  • 16,364
  • 2
  • 31
  • 53
  • 1
    What does this mean? `sum, sieve = 0, [True] * n` I am new to programming so can you please explain it to me. – smid_the_best Apr 07 '16 at 01:50
  • That statement initializes variable _sum_ to 0 and creates an array _sieve_ of length _n_ with all values initially set to the boolean value _True_. Did you look at the linked essay? – user448810 Apr 07 '16 at 12:33
  • @user448810 why is the second for loop allowed to start at p*p? Should it not be just p and then you subsequently delete all multiples of p? – Funzies Oct 31 '19 at 10:27
  • @Funzies: All numbers smaller than _p_ * _p_ will have already been eliminated by primes smaller than _p_. – user448810 Oct 31 '19 at 12:21
3

There are many optimisations that you could do (and should do since you will need prime generation for many of the problems in project Euler, so having a fast implementation simplifies things later on).

Take a look at the sieve of Atkin (and related sieves) (http://en.wikipedia.org/wiki/Sieve_of_Atkin) to get an understanding of how prime generation can be speeded up over brute force (algorithmically that is).

Then take a look at the awesome answer to this S.O.-post (Fastest way to list all primes below N) that clocks a number of prime generation algorithms/implementations.

Community
  • 1
  • 1
kamjagin
  • 3,334
  • 1
  • 19
  • 24
3

Nobody pointed this out, but using range in Python 2.x is very slow. Use xrange instaed, in this case this should give you a huge performance advantage.
See this question.

Also, you don't have to loop until the number you check, checking until round(sqrt(n)) + 1 is sufficient. (If the number greater than its square divides it, there's a number smaller than the square that you must have already noticed.)

Community
  • 1
  • 1
unddoch
  • 4,105
  • 1
  • 19
  • 30
1

First off, you're looping over too many numbers. You don't need to check if EVERY number less than a given number is a divisor to check if a number is prime (I'll let you figure out why this is yourself). You are running hundreds of billions of loops, where hundreds of millions will do.

Something like this works faster, but is by no means optimal:

    value=2
    for i in range(3, 2000000):
        prime=True 
        if i%2 != 0:
            for j in range(3, int(round(sqrt(i)+1)),2):
                if i % j==0:
                    prime=False
        else:
            prime=False
        if prime==True:
            value+=i
    print value
Charles Dillon
  • 1,625
  • 3
  • 13
  • 18
1

You need to use prime sieve check out eratostheneses sieve and try to implement it in code.

Trial division is very inefficient for finding primes because it has complexity n square, the running time grows very fast. This task is meant to teach you how to find something better.

Pawel Miech
  • 7,002
  • 3
  • 31
  • 52
1

First of all, I think you can split your code by defining a function. However, there is a drawback of using a regular function in this case because every time a normal function return a value, the next call to the function will execute the complete code inside the function again. Since you are iterating 2 million times, it would be better to:

  • Have a function that gives you the next prime number and provisionally returns the control to the caller. Such functions are known as GENERATORS.
  • To define a generator function just use the yield command instead of return.
  • When you use generators , it is like knowing that the function will be called again and when it happens the execution inside the function continues right after the yield instruction instead of going over the whole function again.
  • The advantage of this approach is that on the long run of an iterator you avoid the consumption all of the system's memory.

I recommend you to have a look at this article about generators in python. It provides a more extensive explanation for this example.

The solution would be something like this:

import math

# Check if a number is prime
def is_prime(number):
    if number > 1:
        if number == 2:
            return True
        if number % 2 == 0:
            return False
        for current in range(3, int(math.sqrt(number) + 1), 2):
            if number % current == 0: 
                return False
        return True
    return False

# Get the next after a given number
def get_primes(number):
    while True:
        if is_prime(number):
            yield number
        # Next call to the function will continue here!   
        number += 1 

# Get the sum of all prime numbers under a number
def sum_primes_under(limit):
    total = 2
    for next_prime in get_primes(3):
        if next_prime < limit:
            total += next_prime
        else:
            print(total)
            return

# Call the function
sum_primes_under(2000000)
jdacoello
  • 11
  • 2
0

This question gives output quite very fast when you use sieve of eratosthenes Link to it. You can make it even more faster with a little modification like iterating the whole 2 million numbers just half times by considering only the odd numbers. This way you can save lots of time.

n = 2000000
ar = [False for x in range(n)]
sum = 2
def mul(a):
    i = 2;p = i*a
    while (p < n):
        ar[p] = 1
        ++i
        p = i*a
while (x < n):
    if(ar[x] == 0):
        sum += x;mul(x)
    x += 2
print (sum)

Here you can see the same algorithm in c++:-

#include<bits/stdc++.h>
using namespace std;
const int n = 2000000;
bool ar[n];
void mul(int a)
{
    int i = 2;int p = i*a;
    while(p < n)
    {
        ar[p] = 1;
        ++i;p = i*a;
    }
}
long long sieve()
{
    long long sum = 2;
    for(int i = 3;i < n;i += 2)
    {
        if(ar[i] == 0)
            sum += i,mul(i);
    }
    return sum;
}
int main()
{
    cout<<sieve();
    return 0;
}

C++ works around 10 times faster than python anyways and for this algorithm too.

lethal
  • 41
  • 1
  • 8
0
sum = 2

def isPrime(n):
    if n % 2 == 0: return False
    for i in range(3, int(n**0.5)+1, 2):
        if n % i == 0: return False
    return True
if __name__ == "__main__":
    n = 1
    while n < 2000000:
        n += 2
        if isPrime(n):sum += n
print sum
Amulya Acharya
  • 571
  • 14
  • 16
0
import time
start = time.time()

def is_prime(num):
    prime = True
    for i in range(2,int(num**0.5)+1):
        if num % i == 0:
            prime = False
            break
    return prime
sum_prime = 0
for i in range(2,2000000):
    if is_prime(i):
        sum_prime += i
print("sum: ",sum_prime)

elapsed = (time.time() - start)
print("This code took: " + str(elapsed) + " seconds")