0

I have a prime number generator, I was curious to see how small and how fast I could get a prime number generator to be based on optimizations and such:

from math import sqrt
def p(n):  
  if n < 2: return []  
  s = [True]*(((n/2)-1+n%2)+1)  
  for i in range(int(sqrt(n)) >> 1):  
    if not s[i]: continue  
    for j in range( (i**i+(3*i) << 1) + 3, ((n/2)-1+n%2), (i<<1)+3): s[j] = False   
    q = [2]; q.extend([(i<<1) + 3 for i in range(((n/2)-1+n%2)) if s[i]]); return len(q), q  
print p(input())

The generator works great! It is super fast, feel free to try it out. However, if you input numbers greater than 10^9 or 10^10 (i think) it will crash from a memory error. I can't figure out how to expand the memory it uses so that it can take as much as it needs. Any advice would be greatly appreciated!

My question is very similar to this one, but this is Python, not C.

EDIT: This is one of the memory related tracebacks I get for trying to run 10^9.

python prime.py
1000000000
Traceback (most recent call last):
  File "prime.py", line 9, in <module>
    print p(input())
  File "prime.py", line 7, in p
    for j in range( (i**i+(3*i) << 1) + 3, ((n/2)-1+n%2), (i<<1)+3): s[j] = False
MemoryError
Joseph Farah
  • 2,129
  • 1
  • 21
  • 30
  • 1
    *Take as much as it needs*? You must realize `q` is a `list` and it will run out of memory at some point – Moses Koledoye Jun 03 '16 at 15:11
  • @MosesKoledoye I guess that's my question--is there a way to increase the memory it can use before it runs out? At this point, it only uses a couple thousand megabytes. I feel like I could increase it. Should I update my question to include that? – Joseph Farah Jun 03 '16 at 15:12
  • The suggestions in the C-related question you linked should also apply to Python... – agnul Jun 03 '16 at 15:26

2 Answers2

6

The Problem is in line 7.

for j in range( (i**i+(3*i) << 1) + 3, ((n/2)-1+n%2), (i<<1)+3): s[j] = False

especially this part: i**i

1000000000^1000000000 is a 9 * 10^9 digit long number. Storing it takes multiple Gb if not Tb (WolframAlpha couldn't caclulate it anymore). I know that i ist the square root of n (maximal), but at that large numbers that's not a big difference.

You have to split this caclulation into smaller parts if posible and safe it on a hard drive. This makes the process slow, but doable.

frameworker
  • 318
  • 2
  • 9
  • Thank you so much!! I didn't even realize that! – Joseph Farah Jun 03 '16 at 15:27
  • 1
    Here's a nice entry on prime numbers you might want to take a look at: http://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python?lq=1 – frameworker Jun 03 '16 at 15:50
2

First of all, there is a problem since the generator says that numbers like 33, 35 and 45 are prime.

Other than that, there are several structures taking up memory here:

  • s = [True]*(((n/2)-1+n%2)+1)

A list element takes up several bytes per element. For n = 1 billion the s array is consuming gigabytes.

  • range(...) creates a list and then iterates over the elements. Use xrange(...) instead where possible.

Converting range() to xrange() has pitfalls - e.g. see this SO answer:

OverflowError Python int too large to convert to C long

A better implementation of s is to use a Python integer as a bit-array which has a density of 8 elements per byte. Here is a translation between using a list and a integer:

 s = [True]*(((n/2)-1+n%2)+1)        t = (1 << (n/2)+1)-1

 s[i]                                (t & (1<<i))
 not s[i]                            not (t & (1<<i))

 s[j] = False                        m = 1<<j
                                     if (t & m): t ^= m

Update

Here's an unoptimized version which uses yield and xrange. For larger values of n take care of the limitations of xrange as noted above.

def primes(n):
  if n < 2: return
  yield 2
  end = int( sqrt(n) )
  t = (1 << n) -1

  for p in xrange(3, end, 2):
    if not (t & (1 << p)): continue
    yield p
    for q in xrange(p*p, n, p):
      m = t & (1<<q)
      if (t&m): t ^= m
      continue

  for p in xrange(end - (end%2) +1, n, 2):
    if not (t & (1 << p)): continue
    yield p


def test(n):
  for p in primes(n): print p

test(100000)
Community
  • 1
  • 1
ErikR
  • 50,049
  • 6
  • 66
  • 121
  • Thanks very much for your answer! I accidentally pasted in a slightly older version of the code, but the optimization problems are unaffected. The actual program works perfectly. Thanks for checking and noticing! If I replace the sections of my code with your bit-array replacements, will it be the same? – Joseph Farah Jun 03 '16 at 21:40
  • It should be - I ran some test before posting. Let me know if it isn't. I would also use `yield` the primes as they are found instead of storing them in a list. If the caller of your routine only needs to print out the primes there is no need to compute and store all of them in a list. The caller also decide to store them in a list if that's needed. – ErikR Jun 03 '16 at 22:20
  • Thank you so much!! This is incredibly helpful. I feel bad not marking your answer as correct (the other one immediately solved my memory issue) but I have literally used every single one of your suggestions. Thank you so damn much!!!!!! – Joseph Farah Jun 03 '16 at 23:05