5

for the past few weeks i have been working on making a program that can make prime spirals as efficient as possible. I looked into multithreading to increase the speed of the program, and now i've run into a new problem. My list of primes has a length of 64 million zero's and ones, and this list takes up 240MB of ram. because i use multiprocessing (5 processes in total) my script uses a total of 1.1GB of ram at most, and if it reaches this point it returns a memory error.

Some background information on how i am storing the primes: The primes are stored in a list, and every time i find a prime, i set the value to 1 (example: Primes[13] = 1 (because it's prime), and Primes[14] = 0). To me this seemed like the best solution, as the list wouldn't use a lot of memory

After some basic math i concluded that every zero or one in my primes list takes up 4 bytes (32 bits) of information. This seems logical, but i was wondering if there's a way to store the zero's and ones as single bits, so it won't take up as much memory.

Thanks in advance for any answers, Regards, Harm

  • FWIW, you can condense your prime table even further. Every prime > 30 is congruent to one of {±1, ±7, ±11, ±13} mod 30. I.e., in each block of 30 numbers only 8 of them (at most) can possibly be prime, and the primes can only be in those positions. So the bits of a byte can hold the prime data for a block of 30 numbers. And you can use a byte [array](https://docs.python.org/2/library/array.html) to hold that info efficiently in Python. – PM 2Ring Feb 05 '16 at 11:03
  • I don't exactly understand what you meant, could you provide a link/clarify? –  Feb 05 '16 at 17:37
  • Take a look at this [table of primes](http://oeis.org/wiki/Sieve_of_Eratosthenes#Table_of_sieved_integers), which has 30 columns. See how all the primes > 5 are in 8 columns? So to condense our prime table we only need to save bits for those 8 columns. – PM 2Ring Feb 06 '16 at 14:51
  • It's pretty complicated to only send those columns, but i used this information to make the calculation process even more efficient (only uses 75% of the time now). Thanks :) –  Feb 09 '16 at 08:00
  • Sure, using only those 8 columns does make the code a little more complicated. And bit-fiddling in Python isn't as efficient as it is in C, but it's not _too_ bad, as my code below illustrates. BTW, I've added a faster version of `isprime_bin.py` to my answer. – PM 2Ring Feb 09 '16 at 10:00
  • BTW, if one of the answers below fixes your issue, you should accept it (click the check mark next to the appropriate answer). That does two things. It lets everyone know your issue has been resolved to your satisfaction, and it gives the person that helps you credit for the assist. [See here](http://meta.stackexchange.com/a/5235) for a full explanation. – PM 2Ring Feb 09 '16 at 10:01
  • I see, i've been using stackoverflow for a long time, but this is my first post. thanks for telling me :) –  Feb 22 '16 at 19:49

2 Answers2

3

If every 0 or 1 takes 32 bits, it means it's character (maybe integer?) array. You should use boolean type (bool) instead. The simplest way to do that is to use bitarray. One of implementations:

https://pypi.python.org/pypi/bitarray/0.8.1

Flash Thunder
  • 10,029
  • 6
  • 39
  • 82
  • Thanks, that was exactly what i was looking for, i just couldn't find it. I was indeed using integers to store zero's and ones, which is obviously not the right solution. –  Feb 05 '16 at 11:07
0

Here's some Python 2 code that creates a file of primes packed into bytes, with each byte encoding the primality of a block of 30 numbers, utilising the fact that all primes > 5 are coprime to 30 and hence are congruent to one of (1, 7, 11, 13, 17, 19, 23, 29) mod 30.

sieve_bin.py

#! /usr/bin/env python

''' Prime sieve.

    Save primes to a file with the primes in each block of 30 numbers 
    packed into a byte, utilising the fact that all primes > 5 are
    coprime to 30 and hence are congruent to one of 
    (1, 7, 11, 13, 17, 19, 23, 29) mod 30

    Written by PM 2Ring 2016.02.06
    Prime sieve by Robert William Hanks
    See http://stackoverflow.com/q/35222244/4014959
'''

import sys
from array import array

def rwh_primes(n):
    ''' Returns a boolean list of odd primes < n
        Adapted from code by Robert William Hanks
        See http://stackoverflow.com/a/3035188/4014959
    '''
    #Each `sieve` item represents an odd number, starting at 1
    sieve = [True] * (n//2)
    for i in xrange(3, int(n**0.5) + 1, 2):
        if sieve[i//2]:
            sieve[i*i//2::i] = [False] * ((n - i*i - 1) // (2*i) + 1)
    return sieve

def main():
    if len(sys.argv) != 2:
        print '''Generate a file of primes packed into bits.
Usage:
%s hi
to generate a file of primes < `hi`
If `hi` isn't a multiple of 30 it will be rounded down to the nearest multiple of 30
''' % sys.argv[0]
        exit()

    hi = int(sys.argv[1]) // 30 * 30
    fname = 'primebits'

    print 'Generating primes less than %d...' % hi
    odd_primes = rwh_primes(hi)

    print 'Packing primes into bytes...'
    prime_residues = (1, 7, 11, 13, 17, 19, 23, 29)
    bitmasks = [(1<<j, (u - 1) // 2) for j, u in enumerate(prime_residues)]
    packer = (sum(mask for mask, r in bitmasks if odd_primes[i + r])
        for i in xrange(0, hi//2, 15))

    primebytes = array('B', packer)
    with open(fname, 'wb') as f:
        primebytes.tofile(f)
    print 'Saved to', fname


if __name__ == '__main__':
    main()

And here's a program that reads the primebits file created by sieve_bin.py. The file is read into an array of unsigned bytes, so it's efficient in RAM usage: one byte per data byte read from the file plus a small overhead for the array object itself (28 bytes on a 32 bit machine).

The main function of this program creates a list of primes using the isprime function. This is about 4 times slower than using a sieve, but it has far less memory overhead. It can probably be sped up a little, but such optimizations will be left as an exercise for the reader. :)

isprime_bin.py

#! /usr/bin/env python

''' Test if a number is prime, using a table read from disk
    The table contains the primes packed into bytes, with
    each byte encoding the primality of a block of 30 numbers,
    utilising the fact that all primes > 5 are coprime to 30 and
    hence are congruent to one of (1, 7, 11, 13, 17, 19, 23, 29) mod 30

    See http://stackoverflow.com/q/35222244/4014959
'''

import sys
import os.path
from array import array

def load_primes(fname):
    primebytes = array('B')
    filesize = os.path.getsize(fname)  
    with open(fname, 'rb') as f:
        primebytes.fromfile(f, filesize)

    return primebytes

prime_residues = (1, 7, 11, 13, 17, 19, 23, 29)
#Build a dict for fast conversion of residue to bitmask
bitmasks = dict((v, 1<<i) for i, v in enumerate(prime_residues))

def isprime(n, primebytes):
    if n < 2:
        return False
    if n in (2, 3, 5):
        return True
    r = n % 30
    if r not in bitmasks:
        return False
    b = primebytes[n // 30]
    return b & bitmasks[r]

#Test
def main():
    m = int(sys.argv[1]) if len(sys.argv) > 1 else 300

    fname = 'primebits'
    primebytes = load_primes(fname)
    primes = [i for i in xrange(m) if isprime(i, primebytes)]
    print primes


if __name__ == '__main__':
    main()

On my old single core 2GHz machine with 2GB of RAM it takes around 26 seconds for sieve_bin.py to create a primebits file for the numbers < 64000020 (file size = 2133334 bytes); roughly half of that time is spent sieving the primes. isprime_bin.py takes around 124 seconds to produce the list of primes from that file (sending the output to /dev/null).

The output was verified by comparing it to output generated by a conventional Eratosthenes sieve program.


The code in isprime_bin.py is designed to test arbitrary positive integers for primality. To simply generate a list of primes it can be sped up considerably, since the first two if tests are only applicable to numbers <= 5. Here's a modified version that takes around 47 seconds to produce the list of all primes from the 2133334 byteprimebits file.

#! /usr/bin/env python

import sys
import os.path
from array import array

def load_primes(fname):
    primebytes = array('B')
    filesize = os.path.getsize(fname)  
    with open(fname, 'rb') as f:
        primebytes.fromfile(f, filesize)

    return primebytes

prime_residues = (1, 7, 11, 13, 17, 19, 23, 29)
#Build a dict for fast conversion of residue to bitmask
bitmasks = dict((v, 1<<i) for i, v in enumerate(prime_residues))

def isprime(n, primebytes):
    r = n % 30
    if r not in bitmasks:
        return False
    b = primebytes[n // 30]
    return b & bitmasks[r]

def primes(primebytes):
    s = (2, 3, 5) + prime_residues[1:]
    for i in s:
        yield i
    s = prime_residues
    j = 30
    while True:
        try:
            for i in s:
                p = j + i
                if isprime(p, primebytes):
                    yield p
            j += 30
        except IndexError:
            return

#Test
def main():
    m = int(sys.argv[1]) if len(sys.argv) > 1 else 300

    fname = 'primebits'
    primebytes = load_primes(fname)
    primelist = []
    for p in primes(primebytes):
        if p > m:
            break
        primelist.append(p)
    print primelist


if __name__ == '__main__':
    main()
Community
  • 1
  • 1
PM 2Ring
  • 50,023
  • 5
  • 64
  • 150