Haskell's algebraic data types

Question

I'm trying to fully understand all of Haskell's concepts.

In what ways are algebraic data types similar to generic types, e.g., in C# and Java? And how are they different? What's so algebraic about them anyway?

I'm familiar with universal algebra and its rings and fields, but I only have a vague idea of how Haskell's types work.

See also http://stackoverflow.com/questions/5911267/what-are-sums-and-products-data-structures/5914867 — Don Stewart, May 06 '11 at 20:34

score 101 · Answer 1 · edited Jun 20 '20 at 09:12

Haskell's algebraic data types are named such since they correspond to an initial algebra in category theory, giving us some laws, some operations and some symbols to manipulate. We may even use algebraic notation for describing regular data structures, where:

+ represents sum types (disjoint unions, e.g. Either).
• represents product types (e.g. structs or tuples)
X for the singleton type (e.g. data X a = X a)
1 for the unit type ()
and μ for the least fixed point (e.g. recursive types), usually implicit.

with some additional notation:

X² for X•X

In fact, you might say (following Brent Yorgey) that a Haskell data type is regular if it can be expressed in terms of 1, X, +, •, and a least ﬁxed point.

With this notation, we can concisely describe many regular data structures:

Units: data () = ()

1
Options: data Maybe a = Nothing | Just a

1 + X
Lists: data [a] = [] | a : [a]

L = 1+X•L
Binary trees: data BTree a = Empty | Node a (BTree a) (BTree a)

B = 1 + X•B²

Other operations hold (taken from Brent Yorgey's paper, listed in the references):

Expansion: unfolding the fix point can be helpful for thinking about lists. L = 1 + X + X² + X³ + ... (that is, lists are either empty, or they have one element, or two elements, or three, or ...)
Composition, ◦, given types F and G, the composition F ◦ G is a type which builds “F-structures made out of G-structures” (e.g. R = X • (L ◦ R) ,where L is lists, is a rose tree.
Differentiation, the derivative of a data type D (given as D') is the type of D-structures with a single “hole”, that is, a distinguished location not containing any data. That amazingly satisfy the same rules as for differentiation in calculus:

1′ = 0

X′ = 1

(F + G)′ = F' + G′

(F • G)′ = F • G′ + F′ • G

(F ◦ G)′ = (F′ ◦ G) • G′

References:

Species and Functors and Types, Oh My!, Brent A. Yorgey, Haskell’10, September 30, 2010, Baltimore, Maryland, USA
Clowns to the left of me, jokers to the right (Dissecting Data Structures), Conor McBride POPL 2008

I found chapter 3 the book "Real world Haskell" (you co-authored) explaining algebraic data types very nicely. Especially if you are really new to Haskell and you don't have a comp-sci background. — rzetterberg, Nov 04 '11 at 13:35
https://www.haskell.org/haskellwiki/Abstract_data_type lists a binary parameterized tree as an example for an abstract data type and https://www.haskell.org/haskellwiki/Algebraic_data_type states that abstract DT and algebraic DT are mutually exclusive categories. Or is the binary tree here not really assumed to be algebraic (despite the question) as you actually just label it "regular"!? — Raffael, Jan 09 '15 at 12:26

score 22 · Accepted Answer · edited Oct 16 '11 at 21:41

"Algebraic Data Types" in Haskell support full parametric polymorphism, which is the more technically correct name for generics, as a simple example the list data type:

 data List a = Cons a (List a) | Nil

Is equivalent (as much as is possible, and ignoring non-strict evaluation, etc) to

 class List<a> {
     class Cons : List<a> {
         a head;
         List<a> tail;
     }
     class Nil : List<a> {}
 }

Of course Haskell's type system allows more ... interesting use of type parameters but this is just a simple example. With regards to the "Algebraic Type" name, i've honestly never been entirely sure of the exact reason for them being named that, but have assumed that it's due the mathematical underpinnings of the type system. I believe that the reason boils down to the theoretical definition of an ADT being the "product of a set of constructors", however it's been a couple of years since i escaped university so i can no longer remember the specifics.

[Edit: Thanks to Chris Conway for pointing out my foolish error, ADT are of course sum types, the constructors providing the product/tuple of fields]

Generics has been used in so many different ways that the only real common ground is "the kind of polymorphism my language doesn't (or didn't) have, but that we're planning on adding (or have added). — wnoise, Sep 26 '08 at 21:07
This answer doesn't explain in what sense Haskell's data types are *algebraic*. — Don Stewart, May 06 '11 at 21:24
Actually, I think the analogy isn't quite correct - `data List a` is a type constructor, but Cons and Nil are data constructors - they denote the values that are of type List a (the distinction is important because they live in separate namespaces, so you can and often do have type and data constructors of the same name). — Martin, Aug 29 '12 at 10:31

score 19 · Answer 3 · answered Mar 13 '09 at 21:23

In universal algebra an algebra consists of some sets of elements (think of each set as the set of values of a type) and some operations, which map elements to elements.

For example, suppose you have a type of "list elements" and a type of "lists". As operations you have the "empty list", which is a 0-argument function returning a "list", and a "cons" function which takes two arguments, a "list element" and a "list", and produce a "list".

At this point there are many algebras that fit the description, as two undesirable things may happen:

There could be elements in the "list" set which cannot be built from the "empty list" and the "cons operation", so-called "junk". This could be lists starting from some element that fell from the sky, or loops without a beginning, or infinite lists.
The results of "cons" applied to different arguments could be equal, e.g. consing an element to a non-empty list could be equal to the empty list. This is sometimes called "confusion".

An algebra which has neither of these undesirable properties is called initial, and this is the intended meaning of the abstract data type.

The name initial derives from the property that there is exactly one homomorphism from the initial algebra to any given algebra. Essentially you can evaluate the value of a list by applying the operations in the other algebra, and the result is well-defined.

It gets more complicated for polymorphic types ...

score 12 · Answer 4 · answered Mar 16 '09 at 01:28

A simple reason why they are called algebraic; there are both sum (logical disjunction) and product (logical conjunction) types. A sum type is a discriminated union, e.g:

data Bool = False | True

A product type is a type with multiple parameters:

data Pair a b = Pair a b

In O'Caml "product" is made more explicit:

type 'a 'b pair = Pair of 'a * 'b

score 9 · Answer 5 · answered Aug 19 '08 at 20:53

9

Haskell's datatypes are called "algebraic" because of their connection to categorical initial algebras. But that way lies madness.

@olliej: ADTs are actually "sum" types. Tuples are products.

answered Aug 19 '08 at 20:53

Chris Conway

52,725
40
121
150

1

ADTs are not (merely) sum types. – Richard Simões Jul 25 '14 at 17:38
Abstract data types are product types. They are structurally isomorphic to tuples, except their members are labeled. – Mark Cidade May 30 '16 at 14:34

score 3 · Answer 6 · answered Aug 30 '08 at 07:21

3

@Timbo:

You are basically right about it being sort of like an abstract Tree class with three derived classes (Empty, Leaf, and Node), but you would also need to enforce the guarantee that some one using your Tree class can never add any new derived classes, since the strategy for using the Tree datat type is to write code that switches at runtime based on the type of each element in the tree (and adding new derived types would break existing code). You can sort of imagine this getting nasty in C# or C++, but in Haskell, ML, and OCaml, this is central to the language design and syntax so coding style supports it in a much more convenient manner, via pattern matching.

ADT (sum types) are also sort of like tagged unions or variant types in C or C++.

answered Aug 30 '08 at 07:21

Jared Updike

6,806
7
42
67

I'm confused by "since the strategy for using the Tree data type is to ..." as being the reason why a sum type can't be modelled as an inheritance (sub)tree. Your operations are spread across the various classes (type-based switching is a subset of pattern matching), so new nodes will have whatever default behaviour you defined in your TreeNode class (or that your language defines) - perhaps an error indicating that TreeNode's abstract, and you need to implement the appropriate method. – Frank Shearar Jul 27 '11 at 20:42
Yo certainly can add the operations required on each type of node as a (virtual) function member to it. That's what OOP is about, essentially... – vonbrand Jan 08 '16 at 22:26

score 2 · Answer 7 · answered Dec 19 '08 at 22:49

2

old question, but no one's mentioned nullability, which is an important aspect of Algebraic Data Types, perhaps the most important aspect. Since each value most be one of alternatives, exhaustive case-based pattern matching is possible.

answered Dec 19 '08 at 22:49

ja.

4,185
17
21

score 0 · Answer 8 · answered Aug 19 '08 at 19:58

For me, the concept of Haskell's algebraic data types always looked like polymorphism in OO-languages like C#.

Look at the example from http://en.wikipedia.org/wiki/Algebraic_data_types:

data Tree = Empty 
          | Leaf Int 
          | Node Tree Tree

This could be implemented in C# as a TreeNode base class, with a derived Leaf class and a derived TreeNodeWithChildren class, and if you want even a derived EmptyNode class.

(OK I know, nobody would ever do that, but at least you could do it.)

Haskell's algebraic data types

8 Answers8

Linked