What is the correct definition of `unfold` for an untagged tree?

Question

I've been thinking in how to implement the equivalent of unfold for the following type:

data Tree a = Node (Tree a) (Tree a) | Leaf a | Nil

It was not immediately obvious since the standard unfold for lists returns a value and the next seed. For this datatype, it doesn't make sense, since there is no "value" until you reach a leaf node. This way, it only really makes sense to return new seeds or stop with a value. I'm using this definition:

data Drive s a = Stop | Unit a | Branch s s deriving Show

unfold :: (t -> Drive t a) -> t -> Tree a
unfold fn x = case fn x of
    Branch a b -> Node (unfold fn a) (unfold fn b)
    Unit a     -> Leaf a
    Stop       -> Nil

main = print $ unfold go 5 where
    go 0 = Stop
    go 1 = Unit 1
    go n = Branch (n - 1) (n - 2)

While this seems to work, I'm not sure this is how it is supposed to be. So, that is the question: what is the correct way to do it?

Um ... this looks like about the only sane thing you can do. — dfeuer, Feb 21 '15 at 23:11
Really? I was pretty sure I was doing something stupid here, but if it is correct I might just delete the question. — MaiaVictor, Feb 21 '15 at 23:12
The question is non trivial, and it might be useful to others. — chi, Feb 22 '15 at 09:54

score 8 · Accepted Answer · answered Feb 22 '15 at 00:16

If you think of a datatype as the fixpoint of a functor then you can see that your definition is the sensible generalisation of the list case.

module Unfold where

Here we start by definition the fixpoint of a functor f: it's a layer of f followed by some more fixpoint:

newtype Fix f = InFix { outFix :: f (Fix f) }

To make things slightly clearer, here are the definitions of the functors corresponding to lists and trees. They have basically the same shape as the datatypes except that we have replace the recursive calls by an extra parameter. In other words, they describe what one layer of list / tree looks like and are generic over the possible substructures r.

data ListF a r = LNil | LCons a r
data TreeF a r = TNil | TLeaf a | TBranch r r

Lists and trees are then respectively the fixpoints of ListF and TreeF:

type List a = Fix (ListF a)
type Tree a = Fix (TreeF a)

Anyways, hopping you now have a better intuition about this fixpoint business, we can see that there is a generic way of defining an unfold function for these.

Given an original seed as well as a function taking a seed and building one layer of f where the recursive structure are new seeds, we can build a whole structure:

unfoldFix :: Functor f => (s -> f s) -> s -> Fix f
unfoldFix node = go
  where go = InFix . fmap go . node

This definition specialises to the usual unfold on list or your definition for trees. In other words: your definition was indeed the right one.

Thanks for explaining the reason behind the algorithm, it is very enlightening to see how my "Drive" and "Tree" datatypes are linked and how concise a generic unfold can be. I wonder why this is not how it is defined on Prelude? — MaiaVictor, Feb 22 '15 at 01:07
Defining all of your recursive datatypes with Fix makes pattern matching quite ugly, and also has a (perhaps non-negligible) performance cost. — user2407038, Feb 22 '15 at 01:10
@user2407038 To that aim, it would be very nice if we had a `Coercible [a] (Fix (ListF a))` instance, allowing us to move between plain recursive types and their fixed point sibling. This would have zero runtime cost, and allow pattern match on both types. (I am assuming that they have identical runtime representation since `Fix` is a newtype -- I might be wrong on this, though). — chi, Feb 22 '15 at 09:51
@user2407038 pattern synonyms to the rescue! https://ghc.haskell.org/trac/ghc/wiki/PatternSynonyms — gallais, Feb 22 '15 at 12:10

What is the correct definition of `unfold` for an untagged tree?

1 Answers1

Linked