5

I'm writing a recursive infinite prime number generator, and I'm almost sure I can optimize it better.

Right now, aside from a lookup table of the first dozen primes, each call to the recursive function receives a list of all previous primes.

Since it's a lazy generator, right now I'm just filtering out any number that is modulo 0 for any of the previous primes, and taking the first unfiltered result. (The check I'm using short-circuits, so the first time a previous prime divides the current number evenly it aborts with that information.)

Right now, my performance degrades around when searching for the 400th prime (37,813). I'm looking for ways to use the unique fact that I have a list of all prior primes, and am only searching for the next, to improve my filtering algorithm. (Most information I can find offers non-lazy sieves to find primes under a limit, or ways to find the pnth prime given pn-1, not optimizations to find pn given 2...pn-1 primes.)

For example, I know that the pnth prime must reside in the range (pn-1 + 1)...(pn-1+pn-2). Right now I start my filtering of integers at pn-1 + 2 (since pn-1 + 1 can only be prime for pn-1 = 2, which is precomputed). But since this is a lazy generator, knowing the terminal bounds of the range (pn-1+pn-2) doesn't help me filter anything.

What can I do to filter more effectively given all previous primes?

Code Sample

  @doc """
  Creates an infinite stream of prime numbers.

    iex> Enum.take(primes, 5)
    [2, 3, 5, 7, 11]

    iex> Enum.take_while(primes, fn(n) -> n < 25 end)
    [2, 3, 5, 7, 11, 13, 17, 19, 23]

  """
  @spec primes :: Stream.t
  def primes do
    Stream.unfold( [], fn primes ->
      next = next_prime(primes)
      { next, [next | primes] }
    end )
  end

  defp next_prime([]),      do: 2
  defp next_prime([2 | _]), do: 3
  defp next_prime([3 | _]), do: 5
  defp next_prime([5 | _]), do: 7
  # ... etc

  defp next_prime(primes) do
    start = Enum.first(primes) + 2
    Enum.first(
      Stream.drop_while(
        Integer.stream(from: start, step: 2),
        fn number ->
          Enum.any?(primes, fn prime ->
            rem(number, prime) == 0
          end )
        end
      )
    )
  end

The primes function starts with an empty array, gets the next prime for it (2 initially), and then 1) emits it from the Stream and 2) Adds it to the top the primes stack used in the next call. (I'm sure this stack is the source of some slowdown.)

The next_primes function takes in that stack. Starting from the last known prime+2, it creates an infinite stream of integers, and drops each integer that divides evenly by any known prime for the list, and then returns the first occurrence.

This is, I suppose, something similar to a lazy incremental Eratosthenes's sieve.

You can see some basic attempts at optimization: I start checking at pn-1+2, and I step over even numbers.

I tried a more verbatim Eratosthenes's sieve by just passing the Integer.stream through each calculation, and after finding a prime, wrapping the Integer.stream in a new Stream.drop_while that filtered just multiples of that prime out. But since Streams are implemented as anonymous functions, that mutilated the call stack.

It's worth noting that I'm not assuming you need all prior primes to generate the next one. I just happen to have them around, thanks to my implementation.

Will Ness
  • 62,652
  • 8
  • 86
  • 167
Chris Keele
  • 3,138
  • 3
  • 27
  • 51
  • Your question is not clear. Do you want to keep generating the prime numbers till eternity? If you post some code, it may clarify a bit. – Abhishek Bansal Dec 18 '13 at 12:17
  • see http://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python/10733621#10733621 – Will Ness Dec 18 '13 at 14:13
  • 1
    What do you mean by "bogs down"? I just wrote a little prime number generator in C# that produces the first 2500 primes in about 1.8 milliseconds. Total. It generates all the primes up to one million in about 150 milliseconds. And this code isn't highly optimized. I suspect your implementation isn't what it should be. Show us your code. Otherwise we really can't help you out. – Jim Mischel Dec 18 '13 at 14:22
  • @JimMischel BTW e.g. [this simple C++ code](http://ideone.com/fapob) runs much faster, on Ideone, 60 ms to get to 15.5 mln (1mln primes); that's about 4 ms for up to 1 mln (too small for Ideone to measure). But maybe that's because C# is just generally slower than C++? Your code is more or less fine, judging from your timings, logBase (78500 / 2500) (150/1.8) = 1.28, it runs at ~ n^1.28, in `n` primes produced (though true sieve should run at ~ n^1.1..1.05). n^1.28 is half-decent (and maybe the measurement at 2500 primes has too much noise in it; if it were 2.8 ms, it would be ~ n^1.15, ~OK). – Will Ness Dec 18 '13 at 16:49
  • @WillNess: Yes, it could be optimized. Not only that, I timed it in debug mode and didn't take JIT time into account. My point was that if a simple program that I just dashed off without any thought towards optimization can compute the first 75,000 primes in 150 ms, then a program that bogs down after 2,500 primes has some serious implementation issues. – Jim Mischel Dec 18 '13 at 17:11
  • @JimMischel yes, yes. The true test, for me, always is to test the [empirical run time orders of growth](https://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth), from several problem sizes, not just one. It looks like the OP's code runs at something like ~n^2, or worse - which is typical for the attempted use of *all* preceding primes to get the next one. – Will Ness Dec 18 '13 at 17:14
  • 1
    @JimMischel 55,555 !!!!!! :) – Will Ness Dec 18 '13 at 17:32
  • @WillNess: See my [simple C# code](http://ideone.com/sA0DU7). On my system in release mode it does the first 2500 in 0.8 ms, and the first 78,500 in 83 ms. The timings on ideone are ... much slower. As you can see, I used a different method of calculating the primes. Rather than an array from 0 to N, I add primes to a dynamic list as they're found. That explains most of the discrepency in our timings. – Jim Mischel Dec 18 '13 at 19:27
  • @JimMischel yeah, your code is trial division. [it runs at `~n^1.43`](http://ideone.com/weY3dxhttp://ideone.com/weY3dx), in `n` primes produced. SoE should run much, much faster. :) – Will Ness Dec 18 '13 at 20:09
  • no true Sieve of Eratosthenes uses `rem` to filter the numbers. The whole point is to *generate* them by counting in steps of that prime: *2p, 3p, 4p ...*. Or for odds: *p^2, p^2+2p, p^2+4p, ....*. These numbers are *by construction* the prime *p*'s multiples. – Will Ness Dec 18 '13 at 21:11
  • what language is this, please? I've just answered what seems like a very similar question in Scheme/streams today; see if it might help. – Will Ness Dec 18 '13 at 21:44
  • It's Elixir, a functional homoiconic language build on the BEAM vm. – Chris Keele Dec 18 '13 at 22:53

3 Answers3

7

For any number k you only need to try division with primes up to and including √k. This is because any prime larger than √k would need to be multiplied with a prime smaller than √k.

Proof:

√k * √k = k so (a+√k) * √k > k (for all 0<a<(k-√k)). From this follows that (a+√k) divides k iff there is another divisor smaller than √k.

This is commonly used to speed up finding primes tremendously.

Emil Vikström
  • 84,510
  • 15
  • 132
  • 168
  • 2
    no, not O(log n). Producing n primes with this method is O( n^1.5 / (log n)^0.5). Producing n primes with sieve of Eratosthenes is O( n log n log log n). Checking if k is prime runs in ~ pi( sqrt(k) ) ~ 2 sqrt(k) / log(k) where k ~= n log n (n is number of primes below k). i.e. ~ 2 sqrt(n/log n). – Will Ness Dec 18 '13 at 14:26
  • Will Ness, you are of course correct! I totally forgot that there's a non-linear correlation between k and n. Not even sure if I would be correct otherwise either, but the point is still that it's a totally different asymptotic runtime. Thank you! – Emil Vikström Dec 18 '13 at 16:20
6

You don't need all prior primes, just those below the square root of your current production point are enough, when generating composites from primes by the sieve of Eratosthenes algorithm.

This greatly reduces the memory requirements. The primes are then simply those odd numbers which are not among the composites.

Each prime p produces a chain of its multiples, starting from its square, enumerated with the step of 2p (because we work only with odd numbers). These multiples, each with its step value, are stored in a dictionary, thus forming a priority queue. Only the primes up to the square root of the current candidate are present in this priority queue (the same memory requirement as that of a segmented sieve of E.).

Symbolically, the sieve of Eratosthenes is

     P = {3,5,7,9, ...} \ &bigcup; {{p2, p2+2p, p2+4p, p2+6p, ...} | p in P}

Each odd prime generates a stream of its multiples by repeated addition; all these streams merged together give us all the odd composites; and primes are all the odd numbers without the composites (and the one even prime number, 2).

In Python (can be read as an executable pseudocode, hopefully),

def postponed_sieve():      # postponed sieve, by Will Ness,   
    yield 2; yield 3;       #   https://stackoverflow.com/a/10733621/849891
    yield 5; yield 7;       # original code David Eppstein / Alex Martelli 
    D = {}                  #   2002, http://code.activestate.com/recipes/117119
    ps = (p for p in postponed_sieve())  # a separate Primes Supply:
    p = ps.next() and ps.next()          #   (3) a Prime to add to dict
    q = p*p                              #   (9) when its sQuare is 
    c = 9                                #   the next Candidate
    while True:
        if c not in D:                # not a multiple of any prime seen so far:
            if c < q: yield c         #   a prime, or
            else:   # (c==q):         #   the next prime's square:
                add(D,c + 2*p,2*p)    #     (9+6,6 : 15,21,27,33,...)
                p=ps.next()           #     (5)
                q=p*p                 #     (25)
        else:                         # 'c' is a composite:
            s = D.pop(c)              #   step of increment
            add(D,c + s,s)            #   next multiple, same step
        c += 2                        # next odd candidate

def add(D,x,s):                          # make no multiple keys in Dict
    while x in D: x += s                 # increment by the given step
    D[x] = s  

Once a prime is produced, it can be forgotten. A separate prime supply is taken from a separate invocation of the same generator, recursively, to maintain the dictionary. And the prime supply for that one is taken from another, recursively as well. Each needs to be supplied only up to the square root of its production point, so very few generators are needed overall (on the order of log log N generators), and their sizes are asymptotically insignificant (sqrt(N), sqrt( sqrt(N) ), etc).

Will Ness
  • 62,652
  • 8
  • 86
  • 167
  • This is pretty clever. But isn't it true that we, for an infinite lazy generator, have to keep all previous primes? I'm not entirely sure what you mean about that we can save memory by forgetting the produced prime, then. – Emil Vikström Dec 18 '13 at 16:22
  • no, not all primes, just those below the sqrt of the current candidate. And they are not held just as primes, but as entries in dictionary, of the form (multiple, 2*prime), to produce the next multiple of that prime, when needed, as next_mult = mult+2*p. The produced prime we can forget, i.e. just, say, print it out into some file, or add into a sum variable, etc. – Will Ness Dec 18 '13 at 16:33
  • yes, correct; and so the dictionary is asked for the next multiple i.e. next composite. And if it's greater than the candidate, then the candidate is a prime. The size of the dictionary will be O( pi(sqrt(k))), for the candidate k, i.e. ~ 2sqrt(k)/log(k). – Will Ness Dec 18 '13 at 16:37
  • 1
    Of the lot of fascinating answers, optimizations, and algorithms posted here this one seems most applicable to my approach. Thanks! – Chris Keele Dec 18 '13 at 22:56
4

I wrote a program that generates the prime numbers in order, without limit, and used it to sum the first billion primes at my blog. The algorithm uses a segmented Sieve of Eratosthenes; additional sieving primes are calculated at each segment, so the process can continue indefinitely, as long as you have space to store the sieving primes. Here's pseudocode:

function init(delta) # Sieve of Eratosthenes
  m, ps, qs := 0, [], []
  sieve := makeArray(2 * delta, True)
  for p from 2 to delta
    if sieve[p]
      m := m + 1; ps.insert(p)
      qs.insert(p + (p-1) / 2)
      for i from p+p to n step p
        sieve[i] := False
  return m, ps, qs, sieve

function advance(m, ps, qs, sieve, bottom, delta)
  for i from 0 to delta - 1
    sieve[i] := True
  for i from 0 to m - 1
    qs[i] := (qs[i] - delta) % ps[i]
  p := ps[0] + 2
  while p * p <= bottom + 2 * delta
    if isPrime(p) # trial division
      m := m + 1; ps.insert(p)
      qs.insert((p*p - bottom - 1) / 2)
    p := p + 2
  for i from 0 to m - 1
    for j from qs[i] to delta step ps[i]
      sieve[j] := False
  return m, ps, qs, sieve

Here ps is the list of sieving primes less than the current maximum and qs is the offset of the smallest multiple of the corresponding ps in the current segment. The advance function clears the bitarray, resets qs, extends ps and qs with new sieving primes, then sieves the next segment.

function genPrimes()
  bottom, i, delta := 0, 1, 50000
  m, ps, qs, sieve := init(delta)
  yield 2
  while True
    if i == delta # reset for next segment
      i, bottom := -1, bottom + 2 * delta
      m, ps, qs, sieve := \textbackslash
        advance(m, ps, qs, sieve, bottom, delta)
    else if sieve[i] # found prime
      yield bottom + 2*i + 1
    i := i + 1

The segment size 2 * delta is arbitrarily set to 100000. This method requires O(sqrt(n)) space for the sieving primes plus constant space for the sieve.

It is slower but saves space to generate candidates with a wheel and test the candidates for primality.

function genPrimes()
  w, wheel := 0, [1,2,2,4,2,4,2,4,6,2,6,4,2,4, \
       6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6, \
       2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10]
  p := 2; yield p
  repeat
    p := p + wheel[w]
    if w == 51 then w := 4 else w := w + 1
    if isPrime(p) yield p

It may be useful to begin with a sieve and switch to a wheel when the sieve grows too large. Even better is to continue sieving with some fixed set of sieving primes, once the set grows too large, then report only those values bottom + 2*i + 1 that pass a primality test.

user448810
  • 16,364
  • 2
  • 31
  • 53