125

How can I find the actual amount of memory required to store a value of some data type in Haskell (mostly with GHC)? Is it possible to evaluate it at runtime (e.g. in GHCi) or is it possible to estimate memory requirements of a compound data type from its components?

In general, if memory requirements of types a and b are known, what is the memory overhead of algebraic data types such as:

data Uno = Uno a
data Due = Due a b

For example, how many bytes in memory do these values occupy?

1 :: Int8
1 :: Integer
2^100 :: Integer
\x -> x + 1
(1 :: Int8, 2 :: Int8)
[1] :: [Int8]
Just (1 :: Int8)
Nothing

I understand that actual memory allocation is higher due to delayed garbage collection. It may be significantly different due to lazy evaluation (and thunk size is not related to the size of the value). The question is, given a data type, how much memory does its value take when fully evaluated?

I found there is a :set +s option in GHCi to see memory stats, but it is not clear how to estimate the memory footprint of a single value.

Boann
  • 44,932
  • 13
  • 106
  • 138
sastanin
  • 36,792
  • 11
  • 94
  • 126

2 Answers2

158

(The following applies to GHC, other compilers may use different storage conventions)

Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.

A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.

So e.g.

data Uno = Uno a
data Due = Due a b

an Uno takes 2 words, and a Due takes 3.

The Int type is defined as

data Int = I# Int#

now, Int# takes one word, so Int takes 2 in total. Most unboxed types take one word, the exceptions being Int64#, Word64#, and Double# (on a 32-bit machine) which take 2. GHC actually has a cache of small values of type Int and Char, so in many cases these take no heap space at all. A String only requires space for the list cells, unless you use Chars > 255.

An Int8 has identical representation to Int. Integer is defined like this:

data Integer
  = S# Int#                            -- small integers
  | J# Int# ByteArray#                 -- large integers

so a small Integer (S#) takes 2 words, but a large integer takes a variable amount of space depending on its value. A ByteArray# takes 2 words (header + size) plus space for the array itself.

Note that a constructor defined with newtype is free. newtype is purely a compile-time idea, and it takes up no space and costs no instructions at run time.

More details in The Layout of Heap Objects in the GHC Commentary.

Simon Marlow
  • 12,607
  • 3
  • 40
  • 32
  • 1
    Thank you, Simon. This is exactly what I wanted to know. – sastanin Jul 15 '10 at 15:19
  • 2
    Isn't the header two words? One for the tag, and one for the forwarding pointer for use during GC or evaluation? So wouldn't that add one word to your total? – Edward KMETT Jul 15 '10 at 17:11
  • Proportional to its value or proportional to the logarithm thereof? – solidsnack Jul 15 '10 at 17:25
  • 6
    @Edward: Thunks are overwritten by indirections (which are later removed by the GC), but those are only 2 words, and every heap object is guaranteed to be at least two 2 words in size. Without any profiling or debugging features turned on the header really is only one word. In GHC, that is, other implementations may do things differently. – nominolo Jul 15 '10 at 17:32
  • 3
    nominolo: yes, but from Closure.h: /* A thunk has a padding word to take the updated value. This is so that the update doesn't overwrite the payload, so we can avoid needing to lock the thunk during entry and update. Note: this doesn't apply to THUNK_STATICs, which have no payload. Note: we leave this padding word in all ways, rather than just SMP, so that we don't have to recompile all our libraries for SMP. */ The payload doesn't get overwritten during an indirection. The indirection is written in a separate location in the Header. – Edward KMETT Jul 15 '10 at 22:39
  • typedef struct { const StgInfoTable* info; #ifdef PROFILING StgProfHeader prof; #endif StgSMPThunkHeader smp; } StgThunkHeader; typedef struct { StgWord pad; } StgSMPThunkHeader; // Note both info and smp are always present in the header – Edward KMETT Jul 15 '10 at 22:41
  • 6
    Yes, but note this is for *thunks* only. It does not apply to constructors. Estimating the size of a thunk is a bit difficult anyway -- you have to count the free variables. – nominolo Jul 15 '10 at 22:55
  • 1
    Ah, found it StgClosurePtr uses StgHeader not StgThunkHeader – Edward KMETT Jul 15 '10 at 23:08
  • What role does the character `#` play in expressions such as `data Int = I# Int#`? – Nordlöw Oct 19 '11 at 21:36
  • 1
    @Nordlöw: It's the [magic hash](http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#magic-hash), and it's mostly just used for naming primitives and unboxed types so that they are easy to distinguish from other names. – hammar Mar 01 '12 at 02:42
  • 1
    Do you really get the header overhead for one-constructor data types? Can't that be optimised away? – Lii Feb 16 '14 at 11:28
  • @Lii, no it cannot be optimized away. For example, the GC does not know in advance the type of a heap object it reaches via a pointer from another heap object, so there needs to be some indication in each object of its size and which words in it are pointers to other objects. Also, regular Haskell evaluation (the "mutator") needs to be able to represent both evaluated values and thunks, which it couldn't do if the representation of an evaluated value could be arbitrary (having no header word). – Reid Barton Apr 28 '14 at 04:16
  • How are strings of small `Char`s optimized? I never heard about that before. – dfeuer Sep 21 '15 at 16:18
5

The ghc-datasize package provides the recursiveSize function to calculate the size of a GHC object. However...

A garbage collection is performed before the size is calculated, because the garbage collector would make heap walks difficult.

...so it wouldn't be practical to call this often!

Also see How to find out GHC's memory representations of data types? and How can I determine size of a type in Haskell?.

Community
  • 1
  • 1
mhwombat
  • 7,595
  • 24
  • 51