When Generating Primes in F#, why is the "Sieve of Erosthenes" so slow in this particular implementatIon?

Question

IE,

What am I doing wrong here? Does it have to to with lists, sequences and arrays and the way the limitations work?

So here is the setup: I'm trying to generate some primes. I see that there are a billion text files of a billion primes. The question isn't why...the question is how are the guys using python calculating all of the primes below 1,000,000 in milliseconds on this post...and what am I doing wrong with the following F# code?

let sieve_primes2 top_number = 
    let numbers = [ for i in 2 .. top_number do yield i ]
    let sieve (n:int list) = 
        match n with
        | [x] -> x,[]
        | hd :: tl -> hd, List.choose(fun x -> if x%hd = 0 then None else Some(x)) tl
        | _ -> failwith "Pernicious list error."
    let rec sieve_prime (p:int list) (n:int list) =  
        match (sieve n) with
        | i,[] -> i::p
        | i,n'  -> sieve_prime (i::p) n'
    sieve_prime [1;0] numbers

With the timer on in FSI, I get 4.33 seconds worth of CPU for 100000... after that, it all just blows up.

Standard advice is to use arrays instead of lists/seqs - for a very fast sieve see here: http://stackoverflow.com/a/8371684/124259 — John Palmer, Aug 17 '12 at 23:13
I would have to say `List.choose` is the major performance killer here but you have so much object creation going on here its not even funny. — ChaosPandion, Aug 17 '12 at 23:36
well, geeze. Its kinda funny :D in my defense, I've have only been programming windows stuff for a few months — cdonlan, Aug 18 '12 at 00:20
@cdonlan Well, the main problem is that you're not implementing the algorithm correctly. http://www.cs.tufts.edu/~nr/comp150fp/archive/melissa-oneill/Sieve-JFP.pdf — J D, Mar 23 '13 at 12:07

pad · Accepted Answer · 2012-08-18T00:15:00.403

5

Your sieve function is slow because you tried to filter out composite numbers up to top_number. With Sieve of Eratosthenes, you only need to do so until sqrt(top_number) and remaining numbers are inherently prime. Suppose we havetop_number = 1,000,000, your function does 78498 rounds of filtering (the number of primes until 1,000,000) while the original sieve only does so 168 times (the number of primes until 1,000).

You can avoid generating even numbers except 2 which cannot be prime from the beginning. Moreover, sieve and sieve_prime can be merged into a recursive function. And you could use lightweight List.filter instead of List.choose.

Incorporating above suggestions:

let sieve_primes top_number = 
    let numbers = [ yield 2
                    for i in 3..2..top_number -> i ]
    let rec sieve ns = 
        match ns with
        | [] -> []
        | x::xs when x*x > top_number -> ns
        | x::xs -> x::sieve (List.filter(fun y -> y%x <> 0) xs)
    sieve numbers

In my machine, the updated version is very fast and it completes within 0.6s for top_number = 1,000,000.

edited Aug 18 '12 at 00:15

answered Aug 17 '12 at 23:58

pad

39,834
7
83
159

This will be even faster if you use an array instead of a list – John Palmer Aug 18 '12 at 00:16
Alright, looks good. Didn't realize it was up to the sqrt(top_number)...mustave missed that sentence. but that makes complete sense when I think about the illustration for the sieve applied to 100 on the wiki article. – cdonlan Aug 18 '12 at 00:18
@JohnPalmer: considering OP is writing recursive functions, I think using list is more natural and fast enough. – pad Aug 18 '12 at 00:23
@pad, this is not a Sieve of Eratosthenes because it divides (modulo of) all prime candidates by all found primes up to the square root of the top number making it a trial division sieve; a true SoE determines composite numbers by adding prime spans to the square of each found prime and removes them as candidate primes. Further optimizations could include not dividing by two and using some kind of lazy processing to avoid large amounts of memory used in the lists (and rebuilding of lists). As well it is limited in range of primes it can process as it isn't a true tail recursion. – GordonBGood Aug 03 '13 at 02:41
@pad, cont'd, in F# lists aren't like the lists of Haskell as they don't have the lazy component and thus the capability of "looping" indefinitely for the construct "head::f tail" as in your above code. To do this in F#, one must either use the PowerPack LazyList or "roll-your-own" laziness as in using CoInductive Streams with continuations as their "tail". Your code works well enough here as long as the range of primes is small enough. – GordonBGood Aug 03 '13 at 14:40

John Palmer · Answer 2 · 2012-08-18T01:02:05.797

4

Based on my code here: stackoverflow.com/a/8371684/124259

Gets the first 1 million primes in 22 milliseconds in fsi - a significant part is probably compiling the code at this point.

#time "on"

let limit = 1000000
//returns an array of all the primes up to limit
let table =
    let table = Array.create limit true //use bools in the table to save on memory
    let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
    let mutable curfactor = 1;
    while curfactor < tlimit-2 do
        curfactor <- curfactor+2
        if table.[curfactor]  then //simple optimisation
            let mutable v = curfactor*2
            while v < limit do
                table.[v] <- false
                v <- v + curfactor
    let out = Array.create (100000) 0 //this needs to be greater than pi(limit)
    let mutable idx = 1
    out.[0]<-2
    let mutable curx=1
    while curx < limit-2 do
        curx <- curx + 2
        if table.[curx] then
            out.[idx]<-curx
            idx <- idx+1
    out

edited Aug 18 '12 at 01:02

answered Aug 18 '12 at 00:54

John Palmer

24,880
3
45
66

Nice! Use the boolean array to assist the initialization of the prime array. I guess there will be one more Array.filter for zeros, cuz of the overshoot in the Array initialization. Have to say, I tend to avoid the Arrays cuz the line-by-line stuff in awk was always so much faster (figured thats what lists/sequences were) at sorting large text files (like facet files), and powershell was so slow when I read it all into an array...course that was at work, where half my ram had been co-opted by some double-E major hacker jerk, now that I think about it. – cdonlan Aug 18 '12 at 15:28
I get several errors when I try to run your code in a brand new fsx file. Are there specific references that need to be added or open statements that are needed? – Alexander Ryan Baggett May 20 '14 at 19:45
@AlexanderRyanBaggett - just double checked and the code works fine for me. Can you post any specific error messages. – John Palmer May 20 '14 at 23:04
I apologize, when I pasted it visual studio indented the #time. Which created problems all over the place. After I fixed that it worked just fine. – Alexander Ryan Baggett May 21 '14 at 05:36

GordonBGood · Answer 3 · 2013-08-04T03:54:41.980

There have been several good answers both as to general trial division algorithm using lists (@pad) and in choice of an array for a sieving data structure using the Sieve of Eratosthenes (SoE) (@John Palmer and @Jon Harrop). However, @pad's list algorithm isn't particularly fast and will "blow up" for larger sieving ranges and @John Palmer's array solution is somewhat more complex, uses more memory than necessary, and uses external mutable state so is not different than if the program were written in an imperative language such as C#.

EDIT_ADD: I've edited the below code (old code with line comments) modifying the sequence expression to avoid some function calls so as to reflect more of an "iterator style" and while it saved 20% of the speed it still doesn't come close to that of a true C# iterator which is about the same speed as the "roll your own enumerator" final F# code. I've modified the timing information below accordingly. END_EDIT

The following true SoE program only uses 64 KBytes of memory to sieve primes up to a million (due to only considering odd numbers and using the packed bit BitArray) and still is almost as fast as @John Palmer's program at about 40 milliseconds to sieve to one million on a i7 2700K (3.5 GHz), with only a few lines of code:

open System.Collections
let primesSoE top_number=
  let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
  let SQRTLMT = (int(sqrt (double top_number))-3)/2
  let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
  for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
  seq { for i = -1 to BFLMT do if i<0 then yield 2u
                               elif buf.[i] then yield uint32(3+i+i) }
//  seq { yield 2u; yield! seq { 0..BFLMT } |> Seq.filter (fun i->buf.[i])
//                                          |> Seq.map (fun i->uint32 (i+i+3)) }
primesSOE 1000000u |> Seq.length;;

Almost all of the elapsed time is spent in the last two lines enumerating the found primes due to the inefficiency of the sequence run time library as well as the cost of enumerating itself at about 28 clock cycles per function call and return with about 16 function calls per iteration. This could be reduced to only a few function calls per iteration by rolling our own iterator, but the code is not as concise; note that in the following code there is no mutable state exposed other than the contents of the sieving array and the reference variable necessary for the iterator implementation using object expressions:

open System
open System.Collections
open System.Collections.Generic
let primesSoE top_number=
  let BFLMT = int((top_number-3u)/2u) in let buf = BitArray(BFLMT+1,true)
  let SQRTLMT = (int(sqrt (double top_number))-3)/2
  let rec cullp i p = if i <= BFLMT then (buf.[i] <- false; cullp (i+p) p)
  for i = 0 to SQRTLMT do if buf.[i] then let p = i+i+3 in cullp (p*(i+1)+i) p
  let nmrtr() =
    let i = ref -2
    let rec nxti() = i:=!i+1;if !i<=BFLMT && not buf.[!i] then nxti() else !i<=BFLMT
    let inline curr() = if !i<0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
                        else let v = uint32 !i in v+v+3u
    { new IEnumerator<_> with
        member this.Current = curr()
      interface IEnumerator with
        member this.Current = box (curr())
        member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
        member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"
      interface IDisposable with
        member this.Dispose() = () }
  { new IEnumerable<_> with
      member this.GetEnumerator() = nmrtr()
    interface IEnumerable with
      member this.GetEnumerator() = nmrtr() :> IEnumerator }
primesSOE 1000000u |> Seq.length;;

The above code takes about 8.5 milliseconds to sieve the primes to a million on the same machine due to greatly reducing the number of function calls per iteration to about three from about 16. This is about the same speed as C# code written in the same style. It's too bad that F#'s iterator style as I used in the first example doesn't automatically generate the IEnumerable boiler plate code as C# iterators do, but I guess that is the intention of sequences - just that they are so damned inefficient as to speed performance due to being implemented as sequence computation expressions.

Now, less than half of the time is expended in enumerating the prime results for a much better use of CPU time.

score 1 · Answer 4 · answered Feb 10 '13 at 21:02

What am I doing wrong here?

You've implemented a different algorithm that goes through each possible value and uses % to determine if it needs to be removed. What you're supposed to be doing is stepping through with a fixed increment removing multiples. That would be asymptotically.

You cannot step through lists efficiently because they don't support random access so use arrays.

When Generating Primes in F#, why is the "Sieve of Erosthenes" so slow in this particular implementatIon?

4 Answers4

Linked