Existential type classes vs. Data Constructors vs. Coproducts

Question

I find myself running up against the same pattern in my designs where I start with a type with a few data constructors, eventually want to be able to type against those data constructors and thus split them into their own types, just to then have to increase the verbosity of other parts of the program by needing to use Either or another tagged-union for situations where I still need to represent multiple of these types (namely collections).

I am hoping someone can point me to a better way of accomplishing what I'm trying to do. Let me start with a simple example. I am modeling a testing system, where you can have nested test suites which eventually end in tests. So, something like this:

data Node =
    Test { source::string }
    Suite { title::string, children::[Node] }

So, pretty simple so far, essentially a fancy Tree/Leaf declaration. However, I quickly realize that I want to be able to make functions that take Tests specifically. As such, I'll now split it up as so:

data Test = Test { source::string }
data Suite = Suite { title::string, children::[Either Test Suite] }

Alternatively I might roll a "custom" Either (especially if the example is more complicated and has more than 2 options), say something like:

data Node =
   fromTest Test
   fromSuite Suite

So, already its pretty unfortunate that just to be able to have a Suite that can have a combination of Suites or Tests I end up with a weird overhead Either class (whether it be with an actual Either or a custom one). If I use existential type classes, I could get away with making both Test and Suite derive "Node_" and then have Suite own a List of said Nodes. Coproducts would allow something similar, where I'd essentially do the same Either strategy without the verbosity of the tags.

Allow me to expand now with a more complex example. The results of the tests can be either Skipped (the test was disabled), Success, Failure, or Omitted (the test or suite could not be run due to a previous failure). Again, I originally started with something like this:

data Result = Success | Omitted | Failure | Skipped
data ResultTree =
    Tree { children::[ResultTree], result::Result } |
    Leaf Result

But I quickly realized I wanted to be able to write functions that took specific results, and more importantly, have the type itself enforce the ownership properties: A successful suite must only own Success or Skipped children, Failure's children can be anything, Omitted can only own Omitted, etc. So now I end up with something like this:

data Success = Success { children::[Either Success Skipped] }
data Failure = Failure { children::[AnyResult] }
data Omitted = Omitted { children::[Omitted] }
data Skipped = Skipped { children::[Skipped] }
data AnyResult =
  fromSuccess Success |
  fromFailure Failure |
  fromOmitted Omitted |
  fromSkipped Skipped

Again, I now have these weird "Wrapper" types like AnyResult, but, I get type enforcement of something that used to only be enforced from runtime operation. Is there a better strategy to this that doesn't involve turning on features like existential type classes?

@Jon Purdy did give a [great answer below](https://stackoverflow.com/a/52995532/1525842). For the curious, [this singleton's tutorial](https://blog.jle.im/entry/introduction-to-singletons-1.html) goes in great depth about this design pattern. — zeronone, Oct 30 '18 at 03:41

Stephane Rolland · Answer 1 · 2018-10-24T19:40:49.887

The first thing that came to my mind reading your sentence: "I quickly realized I wanted to be able to write functions that took specific results" is Refinement Types.

They allow to take only some values from a type as input, and make those constraints compile-time check/error.

There is this video from a talk at HaskellX 2018, that introduces LiquidHaskell, which allows the use of Refinement Types in Haskell:

https://skillsmatter.com/skillscasts/11068-keynote-looking-forward-to-niki-vazou-s-keynote-at-haskellx-2018

You have to decorate your haskell function signature, and have LiquidHaskell installed:

f :: Int -> i : Int {i | i < 3} -> Int would be a function which could only accept as second parameter an Int with a value < 3, checked at compile time.

You might as well put constraints on your Result type.

Sure -- I guess I can do this (although I was more looking to see if I just didn't understand something and I should be modeling things differently -- ideally without language extensions). The more I use these data constructors the more it seems that they're somewhat of a code smell. That is to say, if it merits its own constructor with its own completely different internals, why does it not merit its own top level type? Solely to get around the lack of coproducts and the verbosity of chained Eithers? — Francisco Ryan Tolmasky I, Oct 25 '18 at 16:55

score 2 · Answer 2 · answered Oct 25 '18 at 18:05

I think what you may be looking for is GADTs with DataKinds. This lets you refine the types of each constructor in a data type to a particular set of possible values. For example:

data TestType = Test | Suite

data Node (t :: TestType) where
  TestNode :: { source :: String } -> Node 'Test
  SuiteNode :: { title :: String, children :: [SomeNode] } -> Node 'Suite

data SomeNode where
  SomeNode :: Node t -> SomeNode

Then when a function operates only on tests, it can take a Node 'Test; on suites, a Node 'Suite; and on either, a polymorphic Node a. When pattern-matching on a Node a, each case branch gets access to an equality constraint:

useNode :: Node a -> Foo
useNode node = case node of
  TestNode source ->          {- here it’s known that (a ~ 'Test) -}
  SuiteNode title children -> {- here, (a ~ 'Suite) -}

Indeed if you took a concrete Node 'Test, the SuiteNode branch would be disallowed by the compiler, since it can’t ever match.

SomeNode is an existential that wraps a Node of an unknown type; you can add extra class constraints to this if you want.

You can do a similar thing with Result:

data ResultType = Success | Omitted | Failure | Skipped

data Result (t :: ResultType) where
  SuccessResult
    :: [Either (Result 'Success) (Result 'Skipped)]
    -> Result 'Success
  FailureResult
    :: [SomeResult]
    -> Result 'Failure
  OmittedResult
    :: [Result 'Omitted]
    -> Result 'Omitted
  SkippedResult
    :: [Result 'Skipped]
    -> Result 'Skipped

data SomeResult where
  SomeResult :: Result t -> SomeResult

Of course I assume in your actual code there’s more information in these types; as it is, they don’t represent much. When you have a dynamic computation such as running a test that may produce different kinds of result, you can return it wrapped in SomeResult.

In order to work with dynamic results, you may need to prove to the compiler that two types are equal; for that, I direct you to Data.Type.Equality, which provides a type a :~: b which is inhabited by a single constructor Refl when the two types a and b are equal; you can pattern-match on this to inform the typechecker about type equalities, or use the various combinators to carry out more complicated proofs.

Also useful in conjunction with GADTs (and ExistentialTypes, less generally) is RankNTypes, which basically enables you to pass polymorphic functions as arguments to other functions; this is necessary if you want to consume an existential generically:

consumeResult :: SomeResult -> (forall t. Result t -> r) -> r
consumeResult (SomeResult res) k = k res

This is an example of continuation-passing style (CPS), where k is the continuation.

As a final note, these extensions are widely used and largely uncontroversial; you needn’t be wary of opting in to (most) type system extensions when they let you express what you mean more directly.

I'm working on your thorough example so as to learn `DataKinds`, and I have some problem understanding how to bind a variable to a value of type e.g. `Result 'Skipped` . The data constructor `SkippedResult` seems to expect a list of `Result 'Skipped` so as to be able to construct a `Result 'Skipped`. As a consequence I have a chicken and egg problem when I'm trying to compile sth because I don't see how to provide a list of T, if for constructing a T, I have to provide a list of T. How would you write `var :: Result 'Skipped` `var = SkippedResult _` ? GHC does not help me with the hole :-) — Stephane Rolland, Nov 15 '18 at 23:27
@StephaneRolland: I was just following your examples; you have the same issue with the code from your question, `data Skipped = Skipped [Skipped]`: it’s not really useful unless you add more information. For example, say you want a skipped test to include a test name and a list of children. Then you can declare `SkippedResult :: Text -> [Result 'Skipped] -> Result 'Skipped`, and use it like `SkippedResult "test1" []` for a skipped test with no children, or `SkippedResult "group1" [SkippedResult "test1" [], SkippedResult "test2" []]` for a skipped test group with some children (also skipped). — Jon Purdy, Nov 16 '18 at 01:10
@StephaneRolland: Alternatively, you could add more constructors that happen to have the same tag, e.g.: `SkippedResult :: { testName :: Text } -> Result 'Skipped` and `SkippedResults :: { groupName :: Text, children :: [Result 'Skipped] } -> Result 'Skipped`, then you would use `SkippedResult "test"` for a leaf (single test) and `SkippedResults "group" […]` for a branch (test group). — Jon Purdy, Nov 16 '18 at 01:13
I'm not the OP who posted the question maybe that's one of the reason it made no sense for me ;-) — Stephane Rolland, Nov 16 '18 at 06:52
@StephaneRolland: Oh, I’m sorry! I should’ve paid more attention, haha — Jon Purdy, Nov 16 '18 at 08:56

Existential type classes vs. Data Constructors vs. Coproducts

2 Answers2