5

I've been having trouble the past few days figuring out whether something I'm trying to do is actually feasible in Haskell.

Here is some context: I am trying to code a little markup language (akin to ReST) where the syntax already enables custom extensions through directives. For users to implement new directives, they should be able to add new semantic constructs inside the document datatype. For exemple if one wants to add a directive for displaying math, they might want to have a MathBlock String constructor inside the ast.

Obviously data types are not extensible, and a solution where there is a generic constructor DirectiveBlock String containing the name of the directive (here, "math") is undesirable as we would like to have in our ast only well-formed constructs (so only directives with well-formed arguments).

Using type families, I prototyped something like:

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}

-- Arguments for custom directives.
data family Args :: * -> *

data DocumentBlock
    = Paragraph String
    | forall a. Block (Args a)

Sure enough, if someone wishes to define a new directive for math display, they can do it as such:

data Math
-- The expected arguments for the math directive.
data instance Args Math = MathArgs String

doc :: [DocumentBlock]
doc =
    [ Paragraph "some text"
    , Block (MathArgs "x_{n+1} = x_{n} + 3")
    ]

So far so good, we can only construct documents where directive blocks receive the correct arguments.

The problem arises when one user wants to convert the internal representation of a document to some custom output, say, String. The user needs to provide a default output for all directives, since there will be many and some of them cannot be converted to the target. Furthermore, the user may wish to provide a more specific output for some directives:

class StringWriter a where
    write :: Args a -> String

-- User defined generic conversion for all directives.
instance StringWriter a where
   write _ = "Directive"

-- Custom way of showing the math directive.
instance StringWriter Math where
    write (MathArgs raw) = "Math(" ++ raw ++ ")"

-- Then to display a DocumentBlock
writeBlock :: DocumentBlock -> String
writeBlock (Paragraph t) = "Paragraph(" ++ t ++ ")"
writeBlock (Block args)  = write args

main :: IO ()
main = putStrLn $ writeBlock (Block (MathArgs "a + b"))

With this example, the output is Block and not Math(a+b), so the generic instance for StringWriter is always chosen. Even when playing with {-# OVERLAPPABLE #-}, nothing succeeds.

Is the kind of behavior I'm describing possible at all in Haskell?


When trying to include a generic Writer inside the Block definition, it also fails to compile.

-- ...

class Writer a o where
    write :: Args a -> o

data DocumentBlock
    = Paragraph String
    | forall a o. Writer a o => Block (Args a)

instance {-# OVERLAPPABLE #-} Writer a String where
   write _ = "Directive"

instance {-# OVERLAPS #-} Writer Math String where
    write (MathArgs raw) = "Math(" ++ raw ++ ")"

-- ...
flupe
  • 51
  • 3

1 Answers1

2

Your code does not compile, since Block something has type DocumentBlock, while write expects an Args a argument, and the two types are different. Did you mean writeBlock instead? I'll assume so.

What you might want to try is to add a constraint in your existential type, e.g.:

data DocumentBlock
    = Paragraph String
    | forall a. StringWriter a => Block (Args a)
             -- ^^^^^^^^^^^^^^ --

This has the following effect. Operationally, every time Block something is used, the instance is remembered (a pointer is implicitly stored along the Args a value). That will be a pointer to the catch-all instance, or to the specific one, whichever is the best fit.

When the constructor is then pattern-matched later on, the instance can then be used. Full working code:

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}

-- Arguments for custom directives.
data family Args :: * -> *

data DocumentBlock
    = Paragraph String
    | forall a. StringWriter a => Block (Args a)

data Math
-- The expected arguments for the math directive.
data instance Args Math = MathArgs String

doc :: [DocumentBlock]
doc =
    [ Paragraph "some text"
    , Block (MathArgs "x_{n+1} = x_{n} + 3")
    ]

class StringWriter a where
    write :: Args a -> String

-- User defined generic conversion for all directives.
instance {-# OVERLAPPABLE #-} StringWriter a where
   write _ = "Directive"

-- Custom way of showing the math directive.
instance StringWriter Math where
    write (MathArgs raw) = "Math(" ++ raw ++ ")"

-- Then to display a DocumentBlock
writeBlock :: DocumentBlock -> String
writeBlock (Paragraph t) = "Paragraph(" ++ t ++ ")"
writeBlock (Block args)  = write args

main :: IO ()
main = putStrLn $ writeBlock (Block (MathArgs "a + b"))

This prints Math(a + b).

A final note: for this to work it is crucial that all the relevant instances are in scope when Block is used. Otherwise, GHC might choose the wrong instance, causing some unintended output. This is the main limitation, making overlapping instances a bit fragile in general. As long as there are no orphan instances, this should work.

Also note that, if using other existential types, a user can (intentionally or accidentally) cause GHC to pick the wrong instance anyway. For instance, if we use

data SomeArgs = forall a. SomeArgs (Args a)

toGenericInstance :: DocumentBlock -> DocumentBlock
toGenericInstance (Block a) = case SomeArgs a of
   SomeArgs a' -> Block a'  -- this will always pick the generic instance
toGenericInstance db = db

then, writeBlock (toGenericInstance (Block (MathArgs "a + b"))) will produce Directive instead.

chi
  • 101,733
  • 3
  • 114
  • 189
  • Thanks for the quick answer and sorry about the typo, you assumed right and I just fixed it. I did have a similar solution at some point, however it does not suit the scope of what I am trying to do: provide a generic document model and let users define conversion to some other target language. We have to assume that `StringWriter a` is not available when defining the `DocumentBlock`. When trying to define a more generic typeclass `Writer a o` that will be refined by users, it fails to compile, as directives will not implicetly have an instance for any `Writer a o`. (See next comment) – flupe Jan 04 '19 at 15:22
  • (I have no idea how SO works so see the original post for the code example) – flupe Jan 04 '19 at 15:28
  • @flupe As long as you also provide a generic catch-all instance `instance Writer a o` to be overlapped by users, the constraints should not be a problem, since they can always be satisfied. It is important, though, that the specific instances are in scope when they have to be used. – chi Jan 04 '19 at 15:35
  • @flupe Actually, that constraint is way too weak: it only requires a conversion to "some `o`", which will be useless later on since we can't rely on any `o`. You need a fixed return type for `write`, or something similar. – chi Jan 04 '19 at 15:39