Haskell code littered with TVar operations and functions taking many arguments: code smell?

Question

I'm writing a MUD server in Haskell (MUD = Multi User Dungeon: basically, a multi-user text adventure/role-playing game). The game world data/state is represented in about 15 different IntMaps. My monad transformer stack looks like this: ReaderT MudData IO, where the MudData type is a record type containing the IntMaps, each in its own TVar (I'm using STM for concurrency):

data MudData = MudData { _armorTblTVar    :: TVar (IntMap Armor)
                       , _clothingTblTVar :: TVar (IntMap Clothing)
                       , _coinsTblTVar    :: TVar (IntMap Coins)

...and so on. (I'm using lenses, thus the underscores.)

Some functions need certain IntMaps, while other functions need others. Thus, having each IntMap in its own TVar provides granularity.

However, a pattern has emerged in my code. In the functions that handle player commands, I need to read (and sometimes later write) to my TVars within the STM monad. Thus these functions end up having an STM helper defined in their where blocks. These STM helpers often have quite a few readTVar operations in them, as most commands need to access a handful of the IntMaps. Furthermore, a function for a given command may call out to a number of pure helper functions that also need some or all of the IntMaps. These pure helper functions thus sometimes end up taking a lot of arguments (sometimes over 10).

So, my code has become "littered" with lots of readTVar expressions and functions that take a large number of arguments. Here are my questions: is this a code smell? Am I missing some abstraction that would make my code more elegant? Is there a more ideal way to structure my data/code?

Thanks!

Are you passing a lot of `IntMap`s instead of a `MudData`? Would the latter be an option? Hard to tell without seeing some code -- can you share a small snippet showing a case where the helper needs so many arguments? — chi, Mar 07 '15 at 18:37
Yep, that's definitely code smell. Nope, I don't know the right way to fix it. — dfeuer, Mar 07 '15 at 18:44
I'm not really familiar with TVar but can't you factorize it ? ie remove all the TVar from MUDData, and use (TVar MUDData) where needed ? — mb14, Mar 07 '15 at 19:04
@mb14, that's presumably troublesome because it increases contention. — dfeuer, Mar 07 '15 at 19:26
@chi The reason he can't pass a `MudData` is the helper functions are pure and the `MudData` has lots of `TVar`s for granularity. The pure helper functions may only be used on subsets of the `MudData`, so adding another data structure (or a type parameter to `MudData`) won't help - reading all the `TVar`s to create that one single structure would eliminate the granularity. — Cirdec, Mar 07 '15 at 19:28
@mb14 That is an option, but then I would be locking on the entire world data instead of just what pieces are required on a case-by-case basis. — the-konapie, Mar 07 '15 at 19:28
I see, however, adding "granularity" to locks usually quickly endup with deadlocks. — mb14, Mar 07 '15 at 19:41
@mb14 `TVar`s ensure deadlock freedom, since they can only be accessed in STM transactions. — chi, Mar 07 '15 at 19:44
@Cirdec Why do these helpers have to be pure? One option could be to pass `TVar IntMap`s (or `MudData` carrying all the `TVar`s) and rewrite the helpers inside the `STM` monad so that they can actually read the data. Maybe with lenses and some applicative syntax one can reduce the `readTVar` boilerplate. (Here I am doing wild guesses, I admit.) — chi, Mar 07 '15 at 19:54
@chi I misunderstood what you were suggesting. I though you were suggesting the same thing as @mb14. With [a little abstraction](http://stackoverflow.com/a/28919284/414413), the helper functions can both be pure, operating on a `MudData` without `TVar`s like @mb14 suggested and be passed the whole `MudData`, `TVar`s and all, like you suggest. — Cirdec, Mar 07 '15 at 20:04
Not directly related, but for writing a MUD you might be interested in using [functional reactive programming](https://wiki.haskell.org/Functional_Reactive_Programming) techniques or libraries such as [Netwire](https://wiki.haskell.org/Netwire). — Petr, Mar 08 '15 at 20:53

Cirdec · Answer 1 · 2015-03-07T21:23:30.617

The solution to this problem is in changing the pure helper functions. We don't really want them to be pure, we want to leak out a single side-effect - whether or not they read specific pieces of data.

Let's say we have a pure function that uses only clothing and coins:

moreVanityThanWealth :: IntMap Clothing -> IntMap Coins -> Bool
moreVanityThanWealth clothing coins = ...

It's usually nice to know that a function only cares about e.g. clothing and coins, but in your case this knowledge is irrelevant and is just creating headaches. We are going to deliberately forget this detail. If we followed mb14's suggestion, we would pass an entire pure MudData' like the following to the helper functions.

data MudData' = MudData' { _armorTbl    :: IntMap Armor
                         , _clothingTbl :: IntMap Clothing
                         , _coinsTbl    :: IntMap Coins

moreVanityThanWealth :: MudData' -> Bool
moreVanityThanWealth md =
    let clothing = _clothingTbl md
        coins    = _coinsTbl    md
    in  ...

MudData and MudData' are almost identical to each other. One of them wraps its fields in TVars and the other one doesn't. We can modify MudData so that it takes an extra type parameter (of kind * -> *) for what to wrap the fields in. MudData will have the slightly unusual kind (* -> *) -> *, which is closely related to lenses but doesn't have much library support. I call this pattern a Model.

data MudData f = MudData { _armorTbl    :: f (IntMap Armor)
                         , _clothingTbl :: f (IntMap Clothing)
                         , _coinsTbl    :: f (IntMap Coins)

We can recover the original MudData with MudData TVar. We can recreate the pure version by wrapping the fields in Identity, newtype Identity a = Identity {runIdentity :: a}. In terms of MudData Identity, our function would be written as

moreVanityThanWealth :: MudData Identity -> Bool
moreVanityThanWealth md =
    let clothing = runIdentity . _clothingTbl $ md
        coins    = runIdentity . _coinsTbl    $ md
    in  ...

We've successfully forgotten which parts of the MudData we've used, but now we don't have the lock granularity we want. We need to recover, as a side effect, exactly what we just forgot. If we wrote the STM version of the helper it would look like

moreVanityThanWealth :: MudData TVar -> STM Bool
moreVanityThanWealth md =
    do
        clothing <- readTVar . _clothingTbl $ md
        coins    <- readTVar . _coinsTbl    $ md
        return ...

This STM version for MudData TVar is almost exactly the same as the pure version we just wrote for MudData Identity. They only differ by the type of the reference (TVar vs. Identity), what function we use to get the values out of the references (readTVar vs runIdentity), and how the result is returned (in STM or as a plain value). It would be nice if the same function could be used to provide both. We are going to extract what is common between the two functions. To do so, we'll introduce a type class MonadReadRef r m for the Monads we can read some type of reference from. r is the type of the reference, readRef is the function to get the values out of the references, and m is how the result is returned. The following MonadReadRef is closely related to the MonadRef class from ref-fd.

{-# LANGUAGE FunctionalDependencies #-}

class Monad m => MonadReadRef r m | m -> r where
    readRef :: r a -> m a

As long as code is parameterized over all MonadReadRef r ms, it is pure. We can see this by running it with the following instance of MonadReadRef for ordinary values held in an Identity. The id in readRef = id is the same as return . runIdentity.

instance MonadReadRef Identity Identity where
    readRef = id

We'll rewrite moreVanityThanWealth in terms of MonadReadRef.

moreVanityThanWealth :: MonadReadRef r m => MudData r -> m Bool
moreVanityThanWealth md =
    do
        clothing <- readRef . _clothingTbl $ md
        coins    <- readRef . _coinsTbl    $ md
        return ...

When we add a MonadReadRef instance for TVars in STM, we can use these "pure" computations in STM but leak the side-effect of which TVars were read.

instance MonadReadRef TVar STM where
    readRef = readTVar

Wow! Thank you for taking the time to write a detailed answer. I get the general idea, but I'm going to have to study how the abstraction works and play with some code to fully grok this. But this is neat; I would never have thought of this. — the-konapie, Mar 07 '15 at 20:19

score 16 · Accepted Answer · edited Apr 12 '17 at 07:31

Yes, this obviously makes your code complex and clutters the important code with a lot of boilerplate details. And functions with more than 4 arguments are a sign of problems.

I'd ask the question: Do you really gain anything by having separate TVars? Isn't it a case of premature optimization? Before taking such a design decision as splitting your data structure among multiple separate TVars, I'd definitely do some measurements (see criterion). You can create a sample test that models the expected number of concurrent threads and frequency of data updates and check what are you really gaining or losing by having multiple TVars vs a single one vs an IORef.

Keep in mind:

If there are multiple threads competing for common locks in a STM transaction, the transactions can get restarted several times before they manage to successfully complete. So under some circumstances, having multiple locks can actually make things worse.
If there is ultimately just one data structure that you need to synchronize, you might consider using a single IORef instead. It's atomic operations are very fast, which could compensate for having a single central lock.
In Haskell it's surprisingly difficult for a pure function to block an atomic STM or a IORef transaction for a long time. The reason is laziness: You only need to create thunks within such a transaction, not to evaluate them. This is true in particular for a single atomic IORef. The thunks are evaluated outside such transactions (by a thread that inspects them, or you can decide to force them at some point, if you need more control; this can be desired in your case, as if your system evolves without anybody observing it, you can easily accumulate unevaluated thunks).

If it turns out that having multiple TVars is indeed crucial, then I'd probably write all the code in a custom monad (as described by @Cirdec while I was writing my answer), whose implementation would be hidden from the main code, and which would provide functions for reading (and perhaps also writing) parts of the state. It'd then be run as a single STM transaction, reading and writing only what's needed, and you could have a pure version of the monad for testing.

Thank you for your thoughtful reply. I think you really hit the nail on the head: I didn't base my decision to use multiple TVars on any benchmarking. In truth, my game will probably never have enough simultaneous players to create serious contention over a single lock. Still, I wanted to do things "the best way" and write a server that is scalable. But the "best way" I envisioned ended up biting me. I may have just learned another lesson the hard way. — the-konapie, Mar 07 '15 at 21:36

Haskell code littered with TVar operations and functions taking many arguments: code smell?

2 Answers2