11

This code was taken from the book "Haskell Road to Logic, Math and Programming". It implements sieve of eratosthenes algorithm and solves Project Euler Problem 10.

sieve :: [Integer] -> [Integer]
sieve (0 : xs) = sieve xs
sieve (n : xs) = n : sieve (mark xs 1 n)
  where
    mark :: [Integer] -> Integer -> Integer -> [Integer]
    mark (y:ys) k m | k == m = 0 : (mark ys 1 m)
                    | otherwise = y : (mark ys (k+1) m)

primes :: [Integer]
primes = sieve [2..]

-- Project Euler #10
main = print $ sum $ takeWhile (< 2000000) primes

Actually it runs even slower then the naive prime test. Can someone explain this behaivour?

I suspect it has something to do with iterating each element in the list in the mark function.

Thanks.

Zero Piraeus
  • 47,176
  • 24
  • 135
  • 148
anatoly
  • 270
  • 2
  • 8
  • Somewhat relevant (but not an answer) - [The Genuine Seive of Eratosthenes](http://lambda-the-ultimate.org/node/3127) - there's already a direct link in a comment to the top answer, but this is via a Lambda The Ultimate page. – Steve314 Aug 10 '12 at 07:51

4 Answers4

12

You are building up a quadratic number of unevaluated thunks using this algorithm. The algorithm relies on laziness so heavily, that this also is the reason why it doesn't scale.

Let's walk through how it works, which hopefully should make the problem apparent. For simplicitly, let's say that we want to print the elements of primes ad infinitum, i.e. we want to evaluate each cell in the list one after the other. primes is defined as:

primes :: [Integer]
primes = sieve [2..]

Since 2 isn't 0, the second definition of sieve applies, and 2 is added to the list of primes, and the rest of the list is an unevaluated thunk (I use tail instead of the pattern match n : xs in sieve for xs, so tail isn't actually being called, and doesn't add any overhead in the code below; mark is actually the only thunked function):

primes = 2 : sieve (mark (tail [2..]) 1 2)

Now we want the second primes element. So, we walk through the code (exercise for the reader) and end up with:

primes = 2 : 3 : sieve (mark (tail (mark (tail [2..]) 1 2)) 1 3)

Same procedure again, we want to evaluate the next prime...

primes = 2 : 3 : 5 : sieve (mark (tail (tail (mark (tail (mark (tail [2..]) 1 2)) 1 3))) 1 5)

This is starting to look like LISP, but I digress... Starting to see the problem? For each element in the primes list, an increasingly large thunk of stacks of mark applications have to be evaluated. In other words, for each element in the list, there has to be a check for whether that element is marked by any of the preceding primes, by evaluating each mark application in the stack. So, for n~=2000000, the Haskell runtime has to call functions resulting in a call stack with a depth of about ... I don't know, 137900 (let n = 2e6 in n / log n gives a lower bound)? Something like that. This is probably what causes the slow-down; maybe vacuum can tell you more (I don't have a computer with both Haskell and a GUI right now).

The reason why the sieve of Eratosthenes works in languages like C is that:

  1. You aren't using an infinite list.
  2. Because of (1), you can mark the whole array before continuing with the next n, resulting in no call stack overheads at all.
dflemstr
  • 25,186
  • 5
  • 66
  • 102
  • 2
    (For alternative algorithms, read [this paper](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf).) – Louis Wasserman Jul 30 '12 at 10:08
  • 1
    ... and [this one](http://en.literateprograms.org/Sieve_of_Eratosthenes_(Haskell)) – Simon Bergot Jul 30 '12 at 10:10
  • "The reason why the sieve of Eratosthenes works in languages like C" - fwiw you can do strict array programming in Haskell as well, so I suppose in this regard Haskell is "a language like C". – Dan Burton Jul 31 '12 at 02:38
  • A = "(1) and (2) apply to languages like C", B = "(1) and (2) apply to languages like Haskell", P = "Sieve of Eratosthenes works". A → P; B → P. Does that mean that A = B? – dflemstr Jul 31 '12 at 03:03
  • you seem to suggest "an increasingly large thunk" is growing there of nested `mark` applications, but each `mark` works alone, separately, actually producing a stream of numbers, each getting its supply from its predecessor, precisely because each `mark` application *is* getting *forced* by pattern matching - creating an actual list in memory. In Haskell, storage is persistent. So there just isn't such one growing thunk there as you describe. – Will Ness Aug 02 '12 at 16:44
  • Well, by "large thunk" I mean a thunk that depends on a lot of other thunks. As in "`foldl (+) 0 xs` will create a large thunk." It's the same sort of thunk dependency, although the in-memory implications are different as you say. – dflemstr Aug 02 '12 at 16:54
  • Exactly my point. That kind of thunk built up by `foldl` is nowhere to be found in the OP code. `(k+1)` is forced immediately by the next invocation comparing `k==m`. I don't see any other thunks there. Each marking thunk is shallow. – Will Ness Aug 02 '12 at 21:30
  • Exactly my point. The in-memory representation is different, but it's the same kind of thunk dependency as in the `foldl` situation. I.e., the `mark` stack doesn't run to gradually resolve a single value; instead, the first mark needs to produce an element which forces the next mark to traverse its input to produce an element etc. So, forcing an element out of the total stack implies the work of all the `mark`s in the stack, instead of being able to e.g. have an escape condition three layers deep for overlapping pseudo-primes (so `mark x 3` escapes before `mark x 7` triggers when checking 21). – dflemstr Aug 03 '12 at 09:08
  • no, it's not. the inner thunks in the foldl situation remain unforced. That situation is commonly referred to as "unevaluated thunk buildup", just like you have done here. but *here*, there are no unevaluated thunks. each is forced by its consumer, the receiver of its output. *here* they are not nested, but chained *through* the data streams - output of one is input to another. each forced, *shallow*. No nested thunks buildup whatsoever. *here* the implementational detail of using no-arg lambdas aka *thunks* is irrelevant. they could be realized as generators with mutable state just as well. – Will Ness Aug 03 '12 at 09:47
  • and, no traversing is performed. each output element is immediately consumed. Each mark producer works in separation. There is no escaping in postponed filters situation either, so it is not the main issue. The main issue is, there are too many of them. (well, with *filters* there is sieving involved, some numbers are rejected early - here there are no sieving - I mention that in my answer as well - but that is secondary concern). – Will Ness Aug 03 '12 at 09:48
  • (ok, by "escaping" you probably meant what I mean by "sieving" ...) perhaps I misread the question too broadly, as in "why is it so much slow?". In the narrow sense it indeed asks why the OP code is slower than Turners. That is indeed because of non-sieving (but also because each `mark` producer has to constantly update its internal state, - increment its counter - unlike `nomult` in Turner's sieve). There are still no "unevaluated thunks" IMO - precisely because each `tail` is *forced* by pattern matching. You mention it in the answer itself. – Will Ness Aug 03 '12 at 10:02
  • I think you are over-analyzing both the problem and how I chose to describe it. I look at the `vacuum` analysis of the `primes` value, and I personally think my description matches what I see. If there's some detail that makes my description have a completely different meaning to you, I'm sorry, but it also doesn't really matter all that much, either; this is after all a SO question, so there are always other answers offering different perspectives; it's not the end of the world if one doesn't cover 100% of the answer :) – dflemstr Aug 03 '12 at 10:21
8

It's not only the thunks that make it horrendously slow, that algorithm would also be very slow if implemented in C on a finite bit array.

sieve :: [Integer] -> [Integer]
sieve (0 : xs) = sieve xs
sieve (n : xs) = n : sieve (mark xs 1 n)
  where
    mark :: [Integer] -> Integer -> Integer -> [Integer]
    mark (y:ys) k m | k == m = 0 : (mark ys 1 m)
                    | otherwise = y : (mark ys (k+1) m)

For each prime p, this algorithm checks all numbers from p+1 to the limit whether they are multiples of p. It doesn't do so by dividing, as Turner's sieve does, but by comparing a counter to the prime. Now, comparing two numbers is much faster than dividing, but the price paid for that is that each number n is now inspected for each prime < n, and not only for the primes up to n's smallest prime factor.

The result is that the complexity of this algorithm is O(N^2 / log N) versus O( (N/log N)^2 ) for the Turner sieve (and O(N*log (log N)) for a real sieve of Eratosthenes).

The nesting¹ stacking of thunks mentioned by dflemstr exacerbates the problem², but even without that the algorithm would be worse than Turner's. I'm appalled and fascinated at the same time.


¹ "Nesting" may not be the correct word. Although each of the mark thunks is reachable only via the one above it, they don't reference anything from the enclosing thunk's scope.

² There is nothing quadratic in either size or depth of the thunks, though, and the thunks are fairly well-behaved. For the illustration, let's pretend mark were defined with reverse argument order. Then, when 7 is found to be prime, the situation is

sieve (mark 5 2 (mark 3 1 (mark 2 1 [7 .. ])))
~> sieve (mark 5 2 (mark 3 1 (7 : mark 2 2 [8 .. ])))
~> sieve (mark 5 2 (7 : mark 3 2 (mark 2 2 [8 .. ])))
~> sieve (7 : mark 5 3 (mark 3 2 (mark 2 2 [8 .. ])))
~> 7 : sieve (mark 7 1 (mark 5 3 (mark 3 2 (mark 2 2 [8 .. ]))))

and the next pattern-match by sieve forces the mark 7 1 thunk, which forces the mark 5 3 thunk, which forces the mark 3 2 thunk, which forces the mark 2 2 thunk, which forces the [8 .. ] thunk and replaces the head with 0, and wraps the tail in a mark 2 1 thunk. That bubbles up to the sieve, which discards the 0 and then forces the next stack of thunks.

So for every number from p_k + 1 to p_(k+1) (inclusive), the pattern-match in sieve forces a stack/chain of k thunks of the form mark p r. Each of these takes the (y:ys) received from the enclosed thunk ([y ..] for the innermost mark 2 r) and wraps the tail ys in a new thunk, either leaving y unchanged or replacing it with 0, thus building up a new stack/chain of thunks in what will be the tail of the list reaching sieve.

For each found prime, sieve adds another mark p r thunk on top, so at the end, when the first prime larger than 2000000 is found and takeWhile (< 2000000) finishes, there will be 148933 levels of thunks.

The stacking of thunks here doesn't influence the complexity, it just influences the constant factors. In the situation we're dealing with, a lazily generated infinite immutable list, there's not much one can do to reduce the time spent transferring control from one thunk to the next. If we were dealing with a finite mutable list or array that is not lazily generated, as one would in a language like C or Java, it would be much better to let each mark p finish its complete job (that would be a simple for loop with less overhead than function calling/transfer of control) before examining the next number, so there'd never be more than one marker active and less control-passing.

Community
  • 1
  • 1
Daniel Fischer
  • 174,737
  • 16
  • 293
  • 422
  • appalled? be really appalled: `[n | n – Will Ness Jul 31 '12 at 00:14
  • Will, I'm not appalled by the bad algorithm as such. The appalling thing is that the code is from a book - a supposedly good one - and apparently comes without a warning. – Daniel Fischer Jul 31 '12 at 00:23
  • appalled and *fascinated* - I should've quoted you more fully. :) That algo I quoted also *fascinates* me, as it is also a true transcription of the definition, just like Turner sieve is. After all, the primes are indeed all integers above 1 which are non-composite. *Haskell isn't "Maths" after all.* – Will Ness Jul 31 '12 at 01:23
  • 1
    Yes, that algorithm is amazing. It has the slight flaw that the lists should start at 2 to truly reflect the definition of a prime, but apart from that glitch it's like Turner's sieve a neat and concise declaration hiding a terrible complexity - only more so. – Daniel Fischer Jul 31 '12 at 01:46
  • even with `1`s it produces primes for some reason. :) but it is really something *special*, isn't it. A thing to behold. – Will Ness Jul 31 '12 at 03:25
  • Since the upper bound is `n-1`, no product `1*k` can be `n`, so it works. But for `n > 2`, there are primes in the list of `j*k`, although conceptually it should contain only composites. I agree about the specialty, it's the first cubic prime generator I have seen, awesome. – Daniel Fischer Jul 31 '12 at 03:43
  • I don't know if letting each `mark` work to the end at its turn would change much. It'd still be `O(n)` total ops as it is now because `mark` counts by ones, instead of `O(n/p)`. Whether working "by rows" or "by columns", it's the same table that gets scanned through, cell by cell. – Will Ness Aug 04 '12 at 16:10
  • @Will If you have strict mutable data, letting each `mark` do its job in one go eliminates the creation of `thunk { when demanded, do that }` and the control passing, each would just be a loop with little overhead. The complexity would still be the same, but `for(i = p+1, m = 1; i < N; ++i) { if (p == m) { .. }}` is much less work than calling functions and creating thunks. – Daniel Fischer Aug 04 '12 at 16:35
  • it depends how well thunk handling is optimized by the compiler. if it blindly creates new full-blown closures each time just to be discarded later, then what you saying makes a lot of sense. – Will Ness Aug 04 '12 at 16:44
  • I don't see how it could do anything less than write a structure containing the two numbers (prime and remainder) and the index next to process (the pointer to the next element in case of a linked list) plus an indication of which thunk to evaluate before if any. These structures could be reused, updating the appropriate members, but it's still more work than just looping. Not dramatically, I guess, but significant. – Daniel Fischer Aug 04 '12 at 17:04
  • true. though such reified closures would indeed turn into kind of generators, so would be "pulled" rather than "evaluated". the chain of such structures could be then reified as an array, and processed in a loop. that loop could then get unrolled, structures compiled out into a bunch of global variables, etc.; optimizers would kick in, collapsing chains of assignments into mutating just one var, ... who knows where it would end. :) – Will Ness Aug 04 '12 at 21:53
5

Ok, you are definitely right, it is slower than the naive implementation. I took this one from Wikipedia and compared it to your code with GHCI thus:

-- from Wikipedia
sieveW [] = [] 
sieveW (x:xs) = x : sieveW remaining 
  where 
    remaining = [y | y <- xs, y `mod` x /= 0]

-- your code
sieve :: [Integer] -> [Integer]
sieve (0 : xs) = sieve xs
sieve (n : xs) = n : sieve (mark xs 1 n)
  where
    mark :: [Integer] -> Integer -> Integer -> [Integer]
    mark (y:ys) k m | k == m = 0 : (mark ys 1 m)
                    | otherwise = y : (mark ys (k+1) m)

Running gives

[1 of 1] Compiling Main             ( prime.hs, interpreted )
Ok, modules loaded: Main.
*Main> :set +s
*Main> sum $ take 2000 (sieveW [2..])
16274627
(1.54 secs, 351594604 bytes)
*Main> sum $ take 2000 (sieve [2..])
16274627
(12.33 secs, 2903337856 bytes)

To try and understand what was happening and how exactly the mark code works, I tried expanding the code manually:

  sieve [2..]
= sieve 2 : [3..]
= 2 : sieve (mark [3..] 1 2)
= 2 : sieve (3 : (mark [4..] 2 2))
= 2 : 3 : sieve (mark (mark [4..] 2 2) 1 3)
= 2 : 3 : sieve (mark (0 : (mark [5..] 1 2)) 1 3)
= 2 : 3 : sieve (0 : (mark (mark [5..] 1 2) 1 3))
= 2 : 3 : sieve (mark (mark [5..] 1 2) 1 3)
= 2 : 3 : sieve (mark (5 : (mark [6..] 2 2)) 1 3)
= 2 : 3 : sieve (5 : mark (mark [6..] 2 2) 2 3)
= 2 : 3 : 5 : sieve (mark (mark (mark [6..] 2 2) 2 3) 1 5)
= 2 : 3 : 5 : sieve (mark (mark (0 : (mark [7..] 1 2)) 2 3) 1 5)
= 2 : 3 : 5 : sieve (mark (0 : (mark (mark [7..] 1 2) 3 3)) 1 5)
= 2 : 3 : 5 : sieve (0 : (mark (mark (mark [7..] 1 2) 3 3)) 2 5))
= 2 : 3 : 5 : sieve (mark (mark (mark [7..] 1 2) 3 3)) 2 5)
= 2 : 3 : 5 : sieve (mark (mark (7 : (mark [8..] 2 2)) 3 3)) 2 5)

I think I may have made a slight mistake toward the end there, as the 7 looks like it is about to be turned into a 0 and deleted, but the mechanism is clear. This code is merely creating a set of counters counting up to each prime, emitting the next prime at the correct moment and passing it up the list. This is equivalent to merely checking for division by each previous prime as in the naive implementation, with the additional overhead of passing the 0s or primes between the thunks.

There may be some further subtlety here that I am missing. There is a very detailed treatment of The Sieve of Eratosthenes in Haskell along with various optimisations here.

Vic Smith
  • 3,387
  • 1
  • 16
  • 28
  • I don't see that any of the `mod` calculations in `sieveW` are being memoised, partly because each ``y `mod` x`` is only asked for once (elements of `xs` are unique) and partly because even if `xs` contained duplicates, the previously calculated ``y `mod` x`` would not be accessible. – dave4420 Jul 30 '12 at 10:42
  • Yes, you are right — should have thought that fully through. Will edit answer. – Vic Smith Jul 30 '12 at 12:39
1

short answer: the counting sieve is slower than Turner's (a.k.a. "naive") sieve because it emulates direct RAM access with sequential counting, which forces it to pass along the streams unsieved between the marking stages. This is ironic because counting makes it a "genuine" sieve of Eratosthenes, as opposed to the Turner's trial division sieve. Actually removing the multiples, like the Turner's sieve is doing, would mess up the counting.

Both algorithms are extremely slow because they start up multiples elimination work too early from each found prime instead of its square, thus creating too many unneeded stream processing stages (whether filtering or marking) - O(n) of them, instead of just ~ 2*sqrt n/log n, in producing primes up to n in value. No checking for the multiples of 7 is required until 49 is seen in the input.

This answer explains how sieve can be seen as building a pipeline of stream-processing "transducers" behind itself, as it is working:

[2..] ==> sieve --> 2
[3..] ==> mark 1 2 ==> sieve --> 3
[4..] ==> mark 2 2 ==> mark 1 3 ==> sieve 
[5..] ==> mark 1 2 ==> mark 2 3 ==> sieve --> 5
[6..] ==> mark 2 2 ==> mark 3 3 ==> mark 1 5 ==> sieve 
[7..] ==> mark 1 2 ==> mark 1 3 ==> mark 2 5 ==> sieve --> 7
[8..] ==> mark 2 2 ==> mark 2 3 ==> mark 3 5 ==> mark 1 7 ==> sieve
[9..] ==> mark 1 2 ==> mark 3 3 ==> mark 4 5 ==> mark 2 7 ==> sieve
[10..]==> mark 2 2 ==> mark 1 3 ==> mark 5 5 ==> mark 3 7 ==> sieve
[11..]==> mark 1 2 ==> mark 2 3 ==> mark 1 5 ==> mark 4 7 ==> sieve --> 11

The Turner sieve uses nomult p = filter ((/=0).(`rem`p)) in place of mark _ p entries but otherwise looks the same:

[2..] ==> sieveT --> 2
[3..] ==> nomult 2 ==> sieveT --> 3
[4..] ==> nomult 2 ==> nomult 3 ==> sieveT 
[5..] ==> nomult 2 ==> nomult 3 ==> sieveT --> 5
[6..] ==> nomult 2 ==> nomult 3 ==> nomult 5 ==> sieveT 
[7..] ==> nomult 2 ==> nomult 3 ==> nomult 5 ==> sieveT --> 7
[8..] ==> nomult 2 ==> nomult 3 ==> nomult 5 ==> nomult 7 ==> sieveT 

Each such transducer can be implemented as a closure frame (a.k.a. "thunk"), or a generator with mutable state, that is unimportant. The output of each such producer goes directly as input into its successor in a chain. There are no unevaluated thunks here, each is forced by its consumer, to produce its next output.

So, to answer your question,

I suspect it has something to do with iterating each element in the list in the mark function.

yes, exactly. They both run non-postponed schemes otherwise.


The code can be thus improved by postponing the start of stream-marking:

primes = 2:3:filter (>0) (sieve [5,7..] (tail primes) 9)

sieve (x:xs) ps@ ~(p:t) q 
   | x < q = x:sieve xs ps q
   | x==q  = sieve (mark xs 1 p) t (head t^2)
  where
    mark (y:ys) k p 
       | k == p    = 0 : (mark ys 1 p)      -- mark each p-th number in supply
       | otherwise = y : (mark ys (k+1) p)

It now runs at just above O(k^1.5), empirically, in k primes produced. But why count by ones when we can count by an increment. (Each 3rd odd number from 9 can be found by adding 6, again and again.) And then instead of marking, we can weed out the numbers right away, getting ourselves a bona fide sieve of Eratosthenes (even if not the most efficient one):

primes = 2:3:sieve [5,7..] (tail primes) 9

sieve (x:xs) ps@ ~(p:t) q 
   | x < q = x:sieve xs ps q
   | x==q  = sieve (weedOut xs (q+2*p) (2*p)) t (head t^2)
  where
    weedOut i@(y:ys) m s 
       | y < m = y:weedOut ys m s
       | y==m  = weedOut ys (m+s) s
       | y > m = weedOut i (m+s) s

This runs at above O(k^1.2) in k primes produced, quick-n-dirty testing compiled-loaded into GHCi, producing upto 100k - 150k primes, deteriorating to O(k^1.3) at about 0.5 mil primes.


So what kind of speedups are achieved by this? Comparing the OP code with the "Wikipedia"'s Turner sieve,

primes = sieve [2..] :: [Int]
  where
    sieve (x:xs) = x : sieve [y | y <- xs, rem y x /= 0]

there was an 8x speedup of W/OP at 2k (i.e. producing 2000 primes). But at 4k it was a 15x speedup. The Turner sieve seems to run at about O(k^1.9 .. 2.3) empirical complexity in producing k = 1000 .. 6000 primes, and the counting sieve at O(k^2.3 .. 2.6) for the same range.

For the two versions here in this answer, v1/W was an additional 20x faster at 4k, and 43x at 8k. v2/v1 was 5.2x at 20k, 5.8x at 40k and 6.5x faster at producing 80,000 primes.

(for comparison, the priority-queue code by Melissa O'Neill runs at about O(k^1.2) empirical complexity, in k primes produced. It scales much better than the code here, of course).


Here is the sieve of Eratosthenes's definition:

P = {3,5, ...} \ &bigcup; { {p*p, p*p + 2*p, ...} | p in P }

The key to Sieve of Eratosthenes's efficiency is direct generation of multiples of primes, by counting in increments of (twice) prime's value from each prime; and their direct elimination, made possible by conflation of value and address much like in integer sorting algorithms (possible only with mutable arrays). It is immaterial whether it must produce pre-set number of primes or work indefinitely because it can always work by segments.

Will Ness
  • 62,652
  • 8
  • 86
  • 167
  • "The Turner sieve uses `nomult p = filter ((/=0).(\`rem\`p))` ...". Having troubles with the markup in the answer body. – Will Ness Aug 03 '12 at 08:30
  • Fixed that for you, double-backtick code-with-backticks double-backtick also works in question bodies. – Daniel Fischer Aug 03 '12 at 22:15
  • @DanielFischer thanks a lot, I tried double backquotes around "rem" on the inside, not around the whole thing... If we're here already, what about that mythical *unevaluated* thunk buildup (re "increasingly large thunk of ..." in dflemstr answer - see also in comments). I say it ***does not exist*** here. The chain of dependent producers, each *forced* by its consumer, is not a nested thunk like `foldl` would make. Shouldn't we be able to use language in unambiguous manner? You too refer to this in your answer. I think it is erroneous. – Will Ness Aug 04 '12 at 05:36
  • Well, "stacked thunks" may be the better formulation. But you do have after the k-th prime thunks (switching arg order) `mark p_k x (mark ... (mark 3 x (mark 2 x list))...)`, nested or stacked `k` layers deep. So every pattern match in `sieve` tells the outermost thunk "Hey, gimme a constructor", that passes the message down, then the `(_:_)` is passed up again, a couple of thunks replacing the head with 0 on the way (usually). You have a growing nest/stack of thunks (not quadratic in either size or depth, though - proportional to the number of primes found) that slows things down. – Daniel Fischer Aug 04 '12 at 15:04
  • *chained*, not nested. Each existing in its own right in memory. It is forced through the *storage*, not from inside the function above it. It doesn't know about the function/thunk/producer/generator/transducer above it, all it knows is how to produce its *next* output. Same way, all it has is its input "(y:ys)" - storage - it doesn't know where it comes from, from another "mark" down the chain or whatever. There is *no* `mark (mark (mark ... )))` at run-time. Each list-producing function `mark` is best seen as generator, IMO. – Will Ness Aug 04 '12 at 15:41
  • As so often, I don't understand what you're trying to say. A `mark list` thunk, when evaluated, produces a result of the form `(:) x' (mark xs)` if `list` is of the form `(x:xs)`, where `x'` is `x` or 0. It is only evaluated when the enclosing thunk needs to know whether its argument is `y:ys` or not. That happens only when that thunk is evaluated, its evaluation then produces `(:) x'' (mark (mark xs))` [or, if you prefer `(:) x'' thunk where thunk = mark thunk' where thunk' = mark xs` ]. Each of these thunks is only reachable from the thunk above it, looks nested to me. – Daniel Fischer Aug 04 '12 at 16:26
  • Unless there's a technical definition of nested thunks that means something else, if so, I'd appreciate a link. And yes, there is a `mark (mark (mark ...))` at runtime. Even if you think of them as generators, each takes the output of the one below as input, so has a reference to that. – Daniel Fischer Aug 04 '12 at 16:30
  • "nested" implies nested scope. `f x = g where g = z x where z = p q where p = .... ` defines nested thunks. each referring to the enclosing environment. so I thought. – Will Ness Aug 04 '12 at 16:37
  • With that definition, of course these are indeed not nested. Okay, will change wording to avoid misunderstandings. – Daniel Fischer Aug 04 '12 at 16:40