Let's ask the profiler.
We will compile the following example program that should do approximately what your GHCI session did. It's important that we print
the results, like GHCI did, as this forces the computations.
f x = (-x)
xs = map f [0..]
main = do
print (take 50 xs)
print (xs !! 99)
I saved mine as example.hs
. We'll compile it with options to enable profiling
ghc -prof -fprof-auto -rtsopts example.hs
Time profile
We can find out how many times f
was applied with a time profile.
profile +RTS -p
This produces an output file named example.prof
, the following is the interesting portion:
COST CENTRE MODULE no. entries
...
f Main 78 51
We can see that f
was evaluated 51 times, 50 times for the print (take 50 xs)
and once for print (xs !! 99)
. Therefore we can rule out your third possibility, since f was only evaluated 51 times, there can't be results for all of the indices 0-99
- Keep results for indexes 0 - 99, thunk for indexes 100+
Heap profile of results
Profiling the memory on the heap is a bit trickier. The heap profiler takes samples, by default once every .1 seconds. Our program will run so fast that the heap profiler won't take any samples while it's running. We'll add a spin to our program so that the heap profiler will get a chance to take a sample. The following will spin for a number of seconds.
import Data.Time.Clock
spin :: Real a => a -> IO ()
spin seconds =
do
startTime <- getCurrentTime
let endTime = addUTCTime (fromRational (toRational seconds)) startTime
let go = do
now <- getCurrentTime
if now < endTime then go else return ()
go
We don't want the garbage collector to collect the data while the program is running, so we'll add another usage of xs
after the spin.
main = do
print (take 50 xs)
print (xs !! 99)
spin 1
print (xs !! 0)
We'll run this with the default heap profiling option, which groups memory usage by cost center.
example +RTS -h
This produces the file example.hp
. We'll pull a sample out of the middle of the file where the numbers are stable (while it was in a spin
).
BEGIN_SAMPLE 0.57
(42)PINNED 32720
(48)Data.Time.Clock.POSIX.CAF 48
(85)spin.endTime/spin/mai... 56
(84)spin.go/spin/main/Mai... 64
(81)xs/Main.CAF 4848
(82)f/xs/Main.CAF 816
(80)main/Main.CAF 160
(64)GHC.IO.Encoding.CAF 72
(68)GHC.IO.Encoding.CodeP... 136
(57)GHC.IO.Handle.FD.CAF 712
(47)Main.CAF 96
END_SAMPLE 0.57
We can see that f
has produced 816 bytes of memory. For "small" Integer
s, an Integer
consumes 2 word of memory. On my system, a word is 8 bytes of memory, so a "small" Integer
takes 16 bytes. Therefore 816/16 = 51 of the Integer
s produced by f
are probably still in memory.
We can check that all of this memory is actually being used for "small" Integer
s by asking for a profile by closure description with -hd
. We can't both group memory usage by closure descripiton and cost-centre, but we can restrict the profiling to a single cost-centre with -hc
, in this case, we are interested in the f
cost-centre
example +RTS -hd -hcf
This says that all 816 bytes due to f
are being used by S#
, the constructor for "small" Integer
s
BEGIN_SAMPLE 0.57
S# 816
END_SAMPLE 0.57
We can certainly remove the following, since 51 Integer
results are being kept, and it expects to only keep 50 Integer
s
- Keep results for indexes 0 - 49, thunk for indexes 50+
Heap profile of structure and thunks
This leaves us with option
- Keep results for indexes 0 - 49, thunks for indexes 50 - 98, result for index 99, thunk for indexes 100+
Let's guess how much memory this situation would be consuming.
In general, Haskell data
types require 1 word of memory for the constructor, and 1 word for each field. The []
type's :
constructor has two fields, so it should take 3 words of memory, or 24 bytes. 100 :
s would then take 2400 bytes of memory. We will see that this is exactly correct when we ask about the closure description for xs
.
It's difficult to reason about the size of thunks, but we'll give it a try. There would be 49 thunks for the values of indices [50, 98]. Each of these thunks is probably holding the Integer
from the generator [0..]
. It also needs to hold the structure of a thunk, which unfortunately changes when profiling. There will also be a thunk for the remainder of a the list. It will need the Integer
from which the remainder of the list is generated, and the structure of a thunk.
Asking for a memory breakdown by closure description for the xs
cost-centre
example +RTS -hd -hcxs
Gives us the following sample
BEGIN_SAMPLE 0.60
<base:GHC.Enum.sat_s34b> 32
<base:GHC.Base.sat_s1be> 32
<main:Main.sat_s1w0> 16
S# 800
<base:GHC.Base.sat_s1bd> 1568
: 2400
END_SAMPLE 0.60
We were exactly correct about there being 100 :
s requiring 2400 bytes of memory. There are 49+1 = 50 "small" Integer
s S#
occupying 800 bytes for the thunks for the 49 uncalculated values and the thunk for the remainder of the lists. There are 1568 bytes which are probably the 49 thunks for the uncalculated values, which would each then be 32 bytes or 4 words. There are another 80 bytes we can't exactly explain that are left over for the thunk for the remainder of the list.
Both memory and time profiles are consistent with our belief that the program will
- Keep results for indexes 0 - 49, thunks for indexes 50 - 98, result for index 99, thunk for indexes 100+