I self-answered this question after spending a lot of time reading chapter 5 as well as [Hughes 95] and [Wadler 98], for the following reasons:
- The chapter simultaneously deals with many different points (e.g., JSON, pretty printing, hex formats, Haskell modules, the need for signatures, etc.).
- The chapter moves unexpectdly between very high level and low level questions, e.g., generic pretty printing, and escaping JSON strings; somewhat strangely, the discussion of escapting begins after the transition from JSON-specific printing, to generic pretty-printing.
- IIUC, [Wadler 98] presents a very elegant framework and solution, but its specific use here can be simplified to about 20 lines of very straightforward code (see full version here).
Purpose Of A Pretty Printing Library And Doc
Many documents and data structures are (multiway) tree-like:
Therefore, it makes sense to factor out tree pretty-printing, from the actual source of the tree-like data. This factored out library will just contain methods to construct some abstract Doc
from tree-like data, and pretty print this Doc
. The point, therefore, is to service several types of sources at once.
To simplify things, let's focus on a particularly simple source:
data Tree = Tree String [Tree]
deriving (Eq, Show)
which can be constructed like this, for example:
tree =
Tree "a" [
Tree "b" [
Tree "c" []],
Tree "d" [
Tree "e" [],
Tree "f" [],
Tree "g" [],
Tree "h" []
],
Tree "i" []
]
Prettiness Criteria
Again, for a specific simple example, the criteria for "prettiness" is folding nested elements as much as possible, as long as the result does not exceed some specified length. So, for example, for the above tree
, if we are given length 30, the prettiest output is defined to be
a[[c] d[e, f, g, h] i]
if we are given 20
a[
b[c]
d[e, f, g, h]
i
]
and if we are given 8
a[
b[c]
d[
e,
f,
g,
h
]
i
]
An Implementation Of Doc
The following is a simplification of [Walder 98].
Any tree can be expressed by a combination of two types:
a text node, containing a string
a nest node, containing the indentation level, an opening string, child nodes, and a closing text node
Additionally, any node can be folded or not.
In order to represent this, we can use the following:
data Doc =
Text String Int
| Nest Int String [Doc] String Int
deriving (Eq, Show)
We can easily find the length a Doc
will have, if folded, using this:
getDocFoldedLength :: Doc -> Int
getDocFoldedLength (Text s) = length s
getDocFoldedLength (Nest _ _ _ _ l) = l
To create a Nest
, we use the following:
nest :: Int -> String -> [Doc] -> String -> Doc
nest indent open chs close =
Nest indent open chs close (length open + length chs - 1 + sum (map getDocFoldedLength chs) + length close)
Note that the folded-version length is calculated once, and then "cached".
Getting the folded-version length of a Doc
in O(1) is easy:
getDocFoldedLength :: Doc -> Int
getDocFoldedLength (Text s) = length s
getDocFoldedLength (Nest _ _ _ _ l) = l
If we will decide to actually fold a Doc
, we will also need the folded version of its content:
getDocFoldedString :: Doc -> String
getDocFoldedString (Nest _ open cs close _) = open ++ intercalate " " (map getDocFoldedString cs) ++ close
getDocFoldedString (Text s) = s
Constructing a Doc
from a tree can be done like this:
showTree :: Tree -> Doc
showTree (Tree s ts) = if null chs then Text s else nest (1 + length s) (s ++ "[") chs "]" where
chs = intercalateDocs "," $ map showTree ts
where intercalateDocs
is a utility function, intercalating commas between non-Nest
Docs
:
intercalateDocs :: String -> [Doc] -> [Doc]
intercalateDocs _ l | length l < 2 = l
intercalateDocs delim (hd:tl) = case hd of
(Text s) -> (Text (s ++ delim)):intercalateDocs delim tl
otherwise -> hd:intercalateDocs delim tl
E.g., for tree
above showTree tree
gives
Nest 2 "a[" [Nest 2 "b[" [Text "c"] "]" 4,Nest 2 "d[" [Text "e,",Text "f,",Text "g,",Text "h"] "]" 13,Text "i"] "]" 23
Now for the heart of the matter, a pretty
function, deciding which nested elements to fold. Since each getDocElement
gives us the length of the folded-version of a Doc
, we can efficiently decide whether to fold or not:
pretty :: Int -> Doc -> String
pretty w doc = pretty' 0 w doc where
pretty' i _ (Text s) = replicate i ' ' ++ s
pretty' i w (Nest j open cs close l) | i + j + l <= w =
replicate i ' ' ++ open ++ intercalate " " (map getDocFoldedString cs) ++ close
pretty' i w (Nest j open cs close l) =
replicate i ' ' ++ open ++ "\n" ++ intercalate "\n" (map (pretty' (i + j) w) cs) ++ "\n" ++ replicate i ' ' ++ close
The function pretty' i w doc
transforms doc
into a pretty form, assuming the current indentation is i
, and the width is w
. Specifically,
it transforms any Text
to its string
it folds any Nest
if it fits; if not, it calls itself recursively on the children.
(See full version here.)
Differences From The Paper And Chapter
The papers use solutions that are more elegant and Haskell-Specific. The Doc
's algebraic data type includes also a "horizontal concatenation", that generates a sequence of documents depending on whether it (and the descendants) will be folded or not. A careful search doesn't generate all possible documents (whose number is exponential), but rather discards generating large numbers of layouts that cannot possibly be part of the optimal solution. The solution here achieves the same complexity by caching the folded length within each node, which is simpler.
The chapter uses a slightly different API for compatibility with existing Haskell Pretty-Printing libraries. It organizes the code into modules. It also deals with practical JSON-specific problems such as escaping (which is unrelated to pretty printing).