Prime number hard drive storage for very large primes - Sieve of Atkin

Question

I have implemented the Sieve of Atkin and it works great up to primes nearing 100,000,000 or so. Beyond that, it breaks down because of memory problems.

In the algorithm, I want to replace the memory based array with a hard drive based array. Python's "wb" file functions and Seek functions may do the trick. Before I go off inventing new wheels, can anyone offer advice? Two issues appear at the outset:

Is there a way to "chunk" the Sieve of Atkin to work on segment in memory, and
is there a way to suspend the activity and come back to it later - suggesting I could serialize the memory variables and restore them.

Why am I doing this? An old geezer looking for entertainment and to keep the noodle working.

You are "inventing the wheel" regardless of the storage issue. BTW, I'm not sure how you implemented it because you have not provided your code, but given that you want to check all numbers up to `max`, it is sufficient to use an array of `(max-1)/24+1` bytes. So for 100,000,000 that's about 4MB of memory. — barak manos, Aug 21 '15 at 07:18
100M is not very large, it is not even "large" -- 9 digits is nothing e.g., *commonly used* rsa keys have 100s-1000 digits. Here's [question that discusses numbers with 100M digits](http://stackoverflow.com/q/28420541/4279). The [sieve algorithm is `O(sqrt(N))` in memory](http://stackoverflow.com/a/175956/4279) i.e., ~10KB should be enough. The issue is your particular implementation. Show the code and ask how its memory usage could be improved. — jfs, Aug 21 '15 at 08:40
a [quick search](https://stackoverflow.com/search?tab=relevance&q=segmented%20atkin%20is%3aq) reveals this was [already asked](https://stackoverflow.com/questions/10429747/segmented-sieve-of-atkin-possible) in SO (and answered). about the segmentation, I mean. — Will Ness, Aug 21 '15 at 11:52
Several answers and some good comments have been posted for my question -- enough to cause me to go back and peel the onion a little more before asking more questions. It would be hard for me to select any one answer as "the best" since each seems to deal with a different aspect of my original indistinct set of questions. Being new to SO, I'll thank all that responded and post my own answer in order to close the question and let folks get on with better paying customers. — WyomingGeezer, Aug 21 '15 at 16:49

score 5 · Answer 1 · edited May 23 '17 at 11:52

Implementing the SoA in Python sounds fun, but note it will probably be slower than the SoE in practice. For some good monolithic SoE implementations, see RWH's StackOverflow post. These can give you some idea of the speed and memory use of very basic implementations. The numpy version will sieve to over 10,000M on my laptop.

What you really want is a segmented sieve. This lets you constrain memory use to some reasonable limit (e.g. 1M + O(sqrt(n)), and the latter can be reduced if needed). A nice discussion and code in C++ is shown at primesieve.org. You can find various other examples in Python. primegen, Bernstein's implementation of SoA, is implemented as a segmented sieve (Your question 1: Yes the SoA can be segmented). This is closely related (but not identical) to sieving a range. This is how we can use a sieve to find primes between 10^18 and 10^18+1e6 in a fraction of a second -- we certainly don't sieve all numbers to 10^18+1e6.

Involving the hard drive is, IMO, going the wrong direction. We ought to be able to sieve faster than we can read values from the drive (at least with a good C implementation). A ranged and/or segmented sieve should do what you need.

There are better ways to do storage, which will help some. My SoE, like a few others, uses a mod-30 wheel so has 8 candidates per 30 integers, hence uses a single byte per 30 values. It looks like Bernstein's SoA does something similar, using 2 bytes per 60 values. RWH's python implementations aren't quite there, but are close enough at 10 bits per 30 values. Unfortunately it looks like Python's native bool array is using about 10 bytes per bit, and numpy is a byte per bit. Either you use a segmented sieve and don't worry about it too much, or find a way to be more efficient in the Python storage.

skyking · Answer 2 · 2015-08-21T07:49:41.943

First of all you should make sure that you store your data in an efficient manner. You could easily store the data for up to 100,000,000 primes in 12.5Mb of memory by using bitmap, by skipping obvious non-primes (even numbers and so on) you could make the representation even more compact. This also helps when storing the data on hard drive. You getting into trouble at 100,000,000 primes suggests that you're not storing the data efficiently.

Some hints if you don't receive a better answer.

1.Is there a way to "chunk" the Sieve of Atkin to work on segment in memory

Yes, for the Eratosthenes-like part what you could do is to run multiple elements in the sieve list in "parallell" (one block at a time) and that way minimize the disk accesses.

The first part is somewhat more tricky, what you would want to do is to process the 4*x**2+y**2, 3*x**2+y**2 and 3*x**2-y**2 in a more sorted order. One way is to first compute them and then sort the numbers, there are sorting algorithms that work well on drive storage (still being O(N log N)), but that would hurt the time complexity. A better way would be to iterate over x and y in such a way that you run on a block at a time, since a block is determined by an interval you could for example simply iterate over all x and y such that lo <= 4*x**2+y**2 <= hi.

2.is there a way to suspend the activity and come back to it later - suggesting I could serialize the memory variables and restore them

In order to achieve this (no matter how and when the program is terminated) you have to first have journalizing disk accesses (fx use a SQL database to keep the data, but with care you could do it yourself).

Second since the operations in the first part are not indempotent you have to make sure that you don't repeat those operations. However since you would be running that part block by block you could simply detect which was the last block processed and resume there (if you can end up with partially processed block you'd just discard that and redo that block). For the Erastothenes part it's indempotent so you could just run through all of it, but for increasing speed you could store a list of produced primes after the sieving of them has been done (so you would resume with sieving after the last produced prime).

As a by-product you should even be able to construct the program in a way that makes it possible to keep the data from the first step even when the second step is running and thereby at a later moment extending the limit by continuing the first step and then running the second step again. Perhaps even having two program where you terminate the first when you've got tired of it and then feeding it's output to the Eratosthenes part (thereby not having to define a limit).

Martin Evans · Answer 3 · 2015-08-21T10:17:21.887

You could try using a signal handler to catch when your application is terminated. This could then save your current state before terminating. The following script shows a simple number count continuing when it is restarted.

import signal, os, cPickle

class MyState:
    def __init__(self):
        self.count = 1

def stop_handler(signum, frame):
    global running
    running = False

signal.signal(signal.SIGINT, stop_handler)
running = True
state_filename = "state.txt"

if os.path.isfile(state_filename):
    with open(state_filename, "rb") as f_state:
        my_state = cPickle.load(f_state)
else:
    my_state = MyState()

while running:
    print my_state.count
    my_state.count += 1

with open(state_filename, "wb") as f_state:
    cPickle.dump(my_state, f_state)

As for improving disk writes, you could try experimenting with increasing Python's own file buffering with a 1Mb or more sized buffer, e.g. open('output.txt', 'w', 2**20). Using a with handler should also ensure your file gets flushed and closed.

Note that this and similar solutions doesn't handle situation when the program is terminated in a way that software can't intercept, for example power outage wouldn't be handled. — skyking, Aug 21 '15 at 07:48

score 0 · Answer 4 · answered Aug 21 '15 at 07:08

There is a way to compress the array. It may cost some efficiency depending on the python interpreter, but you'll be able to keep more in memory before having to resort to disk. If you search online, you'll probably find other sieve implementations that use compression.

Neglecting compression though, one of the easier ways to persist memory to disk would be through a memory mapped file. Python has an mmap module that provides the functionality. You would have to encode to and from raw bytes, but it is fairly straightforward using the struct module.

>>> import struct
>>> struct.pack('H', 0xcafe)
b'\xfe\xca'
>>> struct.unpack('H', b'\xfe\xca')
(51966,)

Prime number hard drive storage for very large primes - Sieve of Atkin

4 Answers4