12

It's written that Haskell tuples are simply a different syntax for algebraic data types. Similarly, there are examples of how to redefine value constructors with tuples.

For example, a Tree data type in Haskell might be written as

data Tree a = EmptyTree | Node a (Tree a) (Tree a)

which could be converted to "tuple form" like this:

data Tree a = EmptyTree | Node (a, Tree a, Tree a)

What is the difference between the Node value constructor in the first example, and the actual tuple in the second example? i.e. Node a (Tree a) (Tree a) vs. (a, Tree a, Tree a) (aside from just the syntax)?

Under the hood, is Node a (Tree a) (Tree a) just a different syntax for a 3-tuple of the appropriate types at each position?

I know that you can partially apply a value constructor, such as Node 5 which will have type: (Node 5) :: Num a => Tree a -> Tree a -> Tree a

You sort of can partially apply a tuple too, using (,,) as a function ... but this doesn't know about the potential types for the un-bound entries, such as:

Prelude> :t (,,) 5
(,,) 5 :: Num a => b -> c -> (a, b, c)

unless, I guess, you explicitly declare a type with ::.

Aside from syntactical specialties like this, plus this last example of the type scoping, is there a material difference between whatever a "value constructor" thing actually is in Haskell, versus a tuple used to store positional values of the same types are the value constructor's arguments?

ely
  • 63,678
  • 30
  • 130
  • 206

2 Answers2

16

Well, coneptually there indeed is no difference and in fact other languages (OCaml, Elm) present tagged unions exactly that way - i.e., tags over tuples or first class records (which Haskell lacks). I personally consider this to be a design flaw in Haskell.

There are some practical differences though:

  1. Laziness. Haskell's tuples are lazy and you can't change that. You can however mark constructor fields as strict:

    data Tree a = EmptyTree | Node !a !(Tree a) !(Tree a)
    
  2. Memory footprint and performance. Circumventing intermediate types reduces the footprint and raises the performance. You can read more about it in this fine answer.

    You can also mark the strict fields with the the UNPACK pragma to reduce the footprint even further. Alternatively you can use the -funbox-strict-fields compiler option. Concerning the last one, I simply prefer to have it on by default in all my projects. See the Hasql's Cabal file for example.


Considering the stated above, if it's a lazy type that you're looking for, then the following snippets should compile to the same thing:

data Tree a = EmptyTree | Node a (Tree a) (Tree a)

data Tree a = EmptyTree | Node {-# UNPACK #-} !(a, Tree a, Tree a)

So I guess you can say that it's possible to use tuples to store lazy fields of a constructor without a penalty. Though it should be mentioned that this pattern is kinda unconventional in the Haskell's community.

If it's the strict type and footprint reduction that you're after, then there's no other way than to denormalize your tuples directly into constructor fields.

Community
  • 1
  • 1
Nikita Volkov
  • 41,289
  • 10
  • 85
  • 162
  • Based on point 2, it seems that the introduction of a data type (as opposed to functions that work off of a tuple convention) is mostly for the semantics it introduces for consumers of the module: a way to organize thinking and reading of the code. If the performance hit for this isn't large (which I presume is almost always, given the ubiquity of creating data types in Haskell) then this extra semantic gain wins out. But if something is very performance critical, or if it's a private part of a module that few consumers should need, then it's good to err on the side of tuple conventions? – ely Dec 15 '14 at 02:46
  • When I say "tuple convention" I mean functions designed to work on a tuple that is not otherwise named or used in a data type. I say it this way because (and I might be confused about this) it looks like you can't have a recursive data type made purely from `tuple` without using either the `data` or `newtype` keywords, which would then hit those memory considerations, right? – ely Dec 15 '14 at 02:48
  • `newtype` is a compile-time only concept and gets erased during compilation. Unlike `data` it introduces no memory overhead over the type it wraps. The updates to my answer should explain the rest. – Nikita Volkov Dec 15 '14 at 03:02
  • @prpl.mnky.dshwshr There's no reason to prefer tuples to custom data types for performance reasons. `data CustomTuple a b c = CustomTuple a b c` will have identical representation (in GHC at least) to `(a, b, c)`. Try compiling and running `main = print (unsafeCoerce (CustomTuple "hi" 32 True) :: (String, Integer, Bool))` to verify this claim (though compiling is important -- runhaskell and ghci will not work here). – Daniel Wagner Dec 16 '14 at 01:57
  • @DanielWagner What about recursion within tuples. It seems like you can only achieve that through the value constructor approach. You can't give a sharthand type synonym like `Foo` to a tuple on the right hand side which has `Foo` in it. (Or maybe you can with some kind of mutual recursion?) – ely Dec 16 '14 at 12:39
  • @prpl.mnky.dshwshr Correct, there are reasons to prefer custom data types over tuples. I claimed *only* that there are no reasons to prefer tuples over custom data types for performance. – Daniel Wagner Dec 16 '14 at 17:22
11

They're what's called isomorphic, meaning "to have the same shape". You can write something like

data Option a = None | Some a

And this is isomorphic to

data Maybe a = Nothing | Just a

meaning that you can write two functions

f :: Maybe a -> Option a
g :: Option a -> Maybe a

Such that f . g == id == g . f for all possible inputs. We can then say that (,,) is a data constructor isomorphic to the constructor

data Triple a b c = Triple a b c

Because you can write

f :: (a, b, c) -> Triple a b c
f (a, b, c) = Triple a b c

g :: Triple a b c -> (a, b, c)
g (Triple a b c) = (a, b, c)

And Node as a constructor is a special case of Triple, namely Triple a (Tree a) (Tree a). In fact, you could even go so far as to say that your definition of Tree could be written as

newtype Tree' a = Tree' (Maybe (a, Tree' a, Tree' a))

The newtype is required since you can't have a type alias be recursive. All you have to do is say that EmptyLeaf == Tree' Nothing and Node a l r = Tree' (Just (a, l, r)). You could pretty simply write functions that convert between the two.

Note that this is all from a mathematical point of view. The compiler can add extra metadata and other information to be able to identify a particular constructor making them behave slightly differently at runtime.

bheklilr
  • 51,704
  • 5
  • 92
  • 148
  • Yeah, I'm not talking about mathematical isomorphism .. I'm interested in the actual in-memory representation, and whether there is a material difference. You could say that C structs implemented with `Py_Object` are ostensibly isomorphic to Python classes, but there is clearly a difference between writing your own C types versus using the Python `class` or `type` facility. – ely Dec 15 '14 at 02:40
  • @prpl.mnky.dshwshr Then Nikita's answer is more what you're looking for. – bheklilr Dec 15 '14 at 02:41
  • Yes, but this one is also good to keep around, in case less math-inclined folks wonder the same question and happen upon the question. – ely Dec 15 '14 at 02:42