0

The runtime of the below code is really long, is there a more efficient way of calculating the sum of all prime numbers under 2million?

primeNumberList = []
previousNumberList = []


for i in range(2,2000000):
    for x in range(2,i):
        previousNumberList.append(x)
    if all(i % n > 0 for n in previousNumberList):
        primeNumberList.append(i)
    previousNumberList = []

print(sum(primeNumberList))
Jack C
  • 25
  • 9
  • 1
    Does that give you the right answer? – Mad Physicist Jan 21 '19 at 03:43
  • Hi @MadPhysicist. The code in the body gives the correct answer for lower numbers, but the runtime takes way too long for me to even let it finish. (over an hour). – Jack C Jan 21 '19 at 03:51
  • 1
    You can replace the `for` loop entirely. Just use `previousNumberList = range(2, i)`. Not only does this eliminate the expensive item by item append with an O(1) range creation, but it replaces the subsequent O(n) lookup with an O(1) operation as well. – Mad Physicist Jan 21 '19 at 03:56
  • 1
    I think it should be enough to check `if all(i % n > 0 for n in primeNumberList):` and omit `previousNumberList` completely. – Michael Butscher Jan 21 '19 at 03:57
  • 1
    @Michael. And you only need to check up to sqrt(i), not the whole list. – Mad Physicist Jan 21 '19 at 03:59

3 Answers3

2

You can optimize it in a bunch of interesting ways.

First, look at algorithmic optimizations.

  • Use algorithms that find prime numbers faster. (See here).

  • Use something like memoization to prevent unnecessary computation.

  • If memory is not an issue, figure out how to exchange memory for runtime.

Next, look at systems level optimizations.

  • Divide it over multiple processes (multiple threads won't add much easily due to Python's Global Interpreter Lock). You can do this using GRPC on one host, or PySpark etc. if you are using multiple hosts.

Finally, look at stuff like loop unrolling etc.

Good luck!

ApprenticeHacker
  • 19,279
  • 24
  • 94
  • 151
2

Start with a faster algorithm for calculating prime numbers. A really good survey is here: Fastest way to list all primes below N

This one (taken from one of the answers of that post) calculates in under a second on my year-old iMac:

def primes(n):
    """ Returns  a list of primes < n """
    sieve = [True] * n
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i]:
            sieve[i*i::2*i]=[False]*((n-i*i-1)//(2*i)+1)
    return [2] + [i for i in range(3,n,2) if sieve[i]]

print(sum(primes(20000000)))
dtanabe
  • 1,421
  • 7
  • 17
  • 2
    Great example of trading memory for speed – Mad Physicist Jan 21 '19 at 04:05
  • I imagine numpy, with its masking and slice assignment optimizations, could speed this up even further. Not to mention the space optimization since a boolean is 1 byte, while a PyObject * is 4 or 8 bytes. – Mad Physicist Jan 21 '19 at 04:16
  • 1
    I picked the first example I saw that had no external dependencies. I'm sure a Numpy variant could be faster, but being able to sum primes up to 20 million on my (admittedly fast) computer in under a second made me stop looking harder for better alternatives. :-) – dtanabe Jan 21 '19 at 04:19
0

As long as you have the memory space for it, Eratosthene's sieve is hard to beat when it comes to finding prime numbers:

def sumPrimes(N):
    prime = [True]*(N+1)
    for n in range(3,int(N**(1/2)+1),2):
        if prime[n] : prime[n*n:N+1:n*2] = [False]*len(range(n*n,N+1,n*2))         
    return sum(n for n in range(3,N+1,2) if prime[n]) + 2*(N > 1)
Alain T.
  • 24,524
  • 2
  • 27
  • 43