Pointers to ADTs in Haskell

Question

I would like to implement term graphs in Haskell, so that I can implement a term rewriting engine that uses sharing. Something like

data TG f v = Var v | Op f [TG f v] | P (Ptr (TG f v))

And I would want something like the following to make sense:

let
    t' = Op 'f' [Var 'x', Var 'y']
    t = getPointer t'
in
    Op 'g' [P t,P t]

Then during rewriting, I only have to rewrite t once.

However, I noticed two things: (1) the module is called Foreign.Storable, so should it only be used for FFI stuff and (2) there are no instances of Foreign.Storable for any types like lists; why is this?

Follow the references to "observable sharing" in http://www.haskell.org/haskellwiki/Embedded_domain_specific_language#Discussion_of_common_problems for an overview of several approaches. — d8d0d65b3f7cf42, Dec 18 '13 at 09:09
and it's possible you may find [zippers/comonads](http://www.haskell.org/haskellwiki/Zipper) handy in that context. — not my job, Dec 18 '13 at 11:18
You don't want pointers. For an updateable references you probably want an `STRef` or `IORef`. Or maybe you don't need updating at all. It's hard to know without knowing what you're trying to do. — augustss, Dec 18 '13 at 12:34
@d8d0d65b3f7cf42: Thank you for the links. It seems that the Gill paper is directly useable while the Classen paper requires a non-conservative extension to Haskell as seen in the Lava language. Do you know if both of these have direct Haskell implementations? — Jonathan Gallagher, Dec 18 '13 at 15:42
@chunksOf50: Something like that could work. I would end up having to keep track of multiple substitutions in a map or some efficient data structure. Is this what you're thinking? There would be some efficiency loss, but it might be negligible in practice? — Jonathan Gallagher, Dec 18 '13 at 15:57
@Augustss: I am trying to implement a term rewriting system in Haskell. There are two main rewriting strategies: innermost and outermost. Innermost rewriting is more efficient for a rule like: f(x) -> g(x,x); since for f(t) since t must be in normal form before making the reduction, I will not end up with reductions in t after the rewrite. With outermost rewriting, in f(t), t might not be in normal form, so after the rewrite to g(t,t), t may need to be rewritten twice. On the other hand, innermost rewriting can waste time developing terms that get discarded — Jonathan Gallagher, Dec 18 '13 at 16:13
The way I've done this sort of thing before is with a hashtable. Your data structure will then contain `Int`s pointing to the global hashtable. You can have a datatype `data Ptr a = Ptr Int` and 'dereference' pointers by looking up the value in the hashtable. If performance is not an issue you could probably get away with using a regular `IntMap`. — user2407038, Dec 18 '13 at 16:15
@Augustss(cont'd): like in a rule h(x) -> a(). Then in h(t), t could be a term which is expensive to develop, and in innermost rewriting we do indeed develop it. However, in outermost rewriting we do not; h(t) is immediately rewritten as a(). "Lazy" rewriting combines the best of both worlds. It is outermost rewriting, so needless terms aren't developed; however, to avoid the problem of copying, for example with f(x) -> g(x,x) at f(t), the rewrite is to g(s,s) where s is a pointer to t. Then, if t needs to be further developed it needs to be only developed once. This is what I want. — Jonathan Gallagher, Dec 18 '13 at 16:22
@Augustss(cont'd): The use of terms that contain sharing; i.e. pointers to terms is called term graph rewriting. The problem I have is that sharing and the use of pointers seem to be part of the representation of a term graph. I hope that helps clarify what I am doing. — Jonathan Gallagher, Dec 18 '13 at 16:25
@user2407038: That seems like it might be acceptable. Suppose I store 't' in the global table. So I have f(hash(t)) -> g(h(t),h(t)). Then t needs a rewrite; how do I update t in the table and its references. Changing t changes the hash of t, so I don't see how to do this exactly. — Jonathan Gallagher, Dec 18 '13 at 16:37
This depends on how you want to treat your values. If your values are 'immutable' then there is no issue since changing a value `t` must create a new value in the table. If your values are 'mutable' then you need to replace all instance of the old hash with the new hash. If this is too costly (perhaps data is being edited very frequently) use a regular `IntMap`, in which case changing a value doesn't change its 'address'. Also consider an mutable array if you want good lookup/modify performance. Again, it all depends on what the most common operation is (read/move in graph/write/etc) — user2407038, Dec 18 '13 at 18:17
You might also like [How do you represent a graph in Haskell?](http://stackoverflow.com/q/9732084/791604). — Daniel Wagner, Dec 18 '13 at 21:11

score 1 · Accepted Answer · answered Jan 07 '14 at 07:13

As pointed out in the comments, if you want to define a normal algebraic datatype in Haskell but gain access to the graph structure, you need to use some variant of observable sharing. Types like ForeignPtr are really for interfacing with external code or low-level memory management and aren't really appropriate for this kind of situation.

All the available techniques for observable sharing require some kind of slightly "unsafe" code - in that the burden is on the user not to misuse it. The issue is that Haskell's semantics aren't intended to allow you to "see" whether two values are the same pointer or not. However in practice the worst that can happen is that you will miss some situation where the user used a single definition, so you will end up with duplication in your internal data structure. Depending on the semantics of your own structure, this may just have a performance impact.

Observable sharing is usually based on lower level primitives for pointer equality - i.e. checking whether two specified Haskell values are actually being stored at exactly the same location in memory, or the more versatile stable names, which represent the location in memory of a single Haskell value and can be stored in a table and compared for equality later on.

Higher level libraries like data-reify help to hide these details from you.

The nicest way to use observable sharing is to allow users to write normal values of the algebraic types, e.g. for your example simply:

let t = Op 'f' [Var 'x', Var 'y']
in Op 'g' [P t,P t]

and then have your library use whichever approach to observable sharing to translate that into some kind of explicit graph structure as soon as you receive the values from the user. For example you might translate into a different datatype with explicit pointers, or augment the TG type with them. The explicit pointers would just be some kind of lookup into your own map structure, e.g.

data InternalTG f v = ... | Pointer Int
type TGMap f v = IntMap (InternalTG f v)

If using something like data-reify then InternalTG f v would be the DeRef type for TG f v.

You can then do your rewriting on the resulting graph structure.

As an alternative to using observable sharing at all, if you are willing for your users to use a monad to construct their values and explicitly choose when to use sharing (as suggested by the inclusion of getPointer above), then you can simply use a state monad to build up the graph explicitly:

-- your code
data TGState f v = TGState { tgMap :: IntMap (TG f v), tgNextSymbol :: Int }

initialTGState :: TGState f v
initialTGState = TGState { tgMap = IntMap.empty, tgNextSymbol = 0 }

type TGMonad f v a = State (TGState f v) a

newtype Ptr tg = Ptr Int -- a "phantom type" just to give some type safety

getPointer :: TG f v -> TGMonad f v (Ptr (TG f v))
getPointer tg = do
   tgState <- get
   let sym = tgNextSymbol tgState
   put $
       TGState { tgMap = IntMap.insert sym tg (tgMap tgState),
                 tgNextSymbol = sym + 1 }
   return (Ptr sym)

runTGMonad :: TGMonad a -> (a, IntMap (TG f v))
runTGMonad m =
    let (v, tgState) = runState m
    (v, tgMap tgState)

-- user code

do
    let t' = Op 'f' [Var 'x', Var 'y']
    t <- getPointer t'
    return $ Op 'g' [P t,P t]

Once you have the graph by whatever route, there are all sorts of techniques for manipulating it, but these are probably beyond the scope of your original question.

Pointers to ADTs in Haskell

1 Answers1