4

Say I have an data type with a lot of constructors.

data ManyValues
  = Value0
  | Value1
  | Value2
  ...
  | Value255
  | Value256
  deriving (Show,Eq)

What's the memory footprint of any one value of this data type? My original understanding was that each constructor is a 8-bit word in memory, but what if there are more constructors in the data type than there are possible values in 8 bits. Does the constructor get bumped up to 16 bits and so on until it can address the full range of constructors present in the data type? Or do I have this all mixed up?

carpemb
  • 681
  • 1
  • 8
  • 18
  • This may help you: https://stackoverflow.com/questions/3254758/memory-footprint-of-haskell-data-types – Sibi Aug 17 '16 at 06:25
  • Thank, I saw that before posting. It makes the interesting point about object sharing when it comes to zero-field constructors, but it doesn't address what happens when there are more constructors (even zero-field constructors) than can be addressed with 8 bits. This is assuming that's what the 8-bit header is being used for. – carpemb Aug 17 '16 at 06:38
  • 6
    Ah, but in that answer, a header "word" is definitely at least 32 bits. Of course, the question still stands in principle (e.g., one approach might be to use the first 32 bits merely to narrow the choice down), but if your datatype has 2^32 constructors, you may face other engineering difficulties. – pigworker Aug 17 '16 at 07:24
  • The distinction between the various kinds of `ManyValues` is erased during compilation; you only need the information for as long as it takes to type-check the code. At runtime, each value is just a single word that basically indicates the value exists. – chepner Aug 17 '16 at 12:29
  • 1
    Looks as if my question came about by being confused about what a "word" is. Here I was thinking about 8-bit words and wondering "How's it going to address all those memory locations?". But yeah, on a 32-bit and 64-bit machine, that "word" would be 32-bit and 64-bit, which was also mentioned in the SO question linked above. So my concern would only be a legit concern once I was also hitting upon pretty fundamental limitations of virtual memory as pigworker said. I should of gotten a good nights sleep before posting this question. :) – carpemb Aug 17 '16 at 13:42

1 Answers1

3

As I understand it, a nullary constructor takes 1 machine word of storage (i.e., it's a pointer to statically allocated data). So whether your data structure has 1 such constructor or 1,000,000, it's still 1 machine word.

Constructors with fields take more space, but GHC special-cases nullary constructors to share a single static singleton between all instances of that value. (E.g., there is only ever one True in the entire program.)

Of course, when a thunk evaluates to an already-existing value (any value), GHC overwrites the thunk with a "redirection" node, which takes up some space. Periodically the garbage collector removes the redirections.

MathematicalOrchid
  • 58,942
  • 16
  • 110
  • 206
  • That makes a lot of sense, so I'm marking this question answered. It turns out the core confusion I had was about what "1 machine word" actually means. But I've educated myself on that matter now. Though this does opened up a further question however about memory fragmentation. – carpemb Aug 17 '16 at 13:52
  • If nullary constructors are just pointers to shared objects, wouldn't that mean I could be referencing an object that is very far away in memory from the actual call site (up until GC moves it closer)? And in such a case it would be more efficient to just represent low-level data as just smart-constructed `Word32` or `Word64` values in an unboxed linear data structure, giving up space efficiency for lower allocations and better memory locality? – carpemb Aug 17 '16 at 13:57
  • 1
    Why would "being very far away in memory" be a problem? That's not how memory locality works. All the nullary constructors in the program are in a single contiguous block of memory that never moves. That should be quite cache-friendly. – MathematicalOrchid Aug 17 '16 at 13:59
  • I was not aware they were kept contiguous like that either, great to know. – carpemb Aug 17 '16 at 14:01
  • Worth noting is that the first few (3 or 7 for 32 resp 64 bits) constructors get special treatment. For those the constructor number is stored in the pointer to the node, saving a memory load. – augustss Aug 18 '16 at 02:35