How does repmin place values in the tree in Haskell?

Question

I really like the repmin problem:

Write down repmin :: Tree Int -> Tree Int, which replaces all the numbers in the tree by their minimum in a single pass.

If I were writing something like this in python, I would go for passing values by their reference (let's say one-element lists instead of numbers is good enough):

def repmin(tree, wrapped_min_link=None):
    x, subforest = tree
    
    if wrapped_min_link is None: 
        wrapped_min_link = [x]
    else:
        [m] = wrapped_min_link
        wrapped_min_link = [min(m, x)]
        
    n = len(subforest)
    
    subforest_min = [None] * n
    for i in range(n):
        if subforest[i]:
            subforest_min[i] = repmin(subforest[i], wrapped_min_link)
    
    return (wrapped_min_link, subforest_min)

It seems to me like a fitting way to wrap one's head around the knot-tying solution in Haskell (I wrote this one for rose trees from Data.Tree):

copyRose :: Tree Int -> Int -> (Tree Int, Int)
copyRose (Node x []) m = (Node m [], x)
copyRose (Node x fo) m = 
 let
    unzipIdMinimum = 
     foldr (\ ~(a, b) ~(as, bmin) -> (a:as, b `min` bmin)) ([], maxBound :: Int)

    (fo', y)       = unzipIdMinimum . map (flip copyRose m) $ fo
 in (Node m fo', x `min` y)

repmin :: Tree Int -> Tree Int
repmin = (loop . uncurry) copyRose

Yet, I reckon the solutions to work very differently. Here is my understanding of the latter one:

Let us rewrite loop for (->) a bit:

loop f b = let cd = f (b, snd cd) in fst cd

I reckon it to be loop for (->)'s workalike as snd gives the same degree of laziness as pattern-matching within let.

So, when repmin traverses through the tree, it is:

Building up the minimum in the tree to be returned as the second element of the pair.
Leaves snd $ copyRose (tree, m) behind in every node.

Thus, when the traversal comes to an end, the programme knows the value of snd $ copyRose (tree, m) (that is, the minimum in the tree) and is able to show it whenever some node of the tree is being computed.

Do I understand repmin in Haskell correctly?

I dunno, but I don't really like your `unzipIdMinimum` function because it doesn't calculate `min` as it goes. I think there's likely a better way. — dfeuer, Sep 07 '20 at 05:27
@dfeuer would adding something like `let bmin' = min b bmin in (join seq) bmin'` make things better? — Zhiltsoff Igor, Sep 07 '20 at 14:04
@dfeuer what do you mean "reshape"? Do we need to reshape just `unzipIdMinimum` or `copyTree` in whole? — Zhiltsoff Igor, Sep 07 '20 at 14:31
@dfeuer perhaps ``unzipIdMinimum = foldr (\ (a, b) (as, !bmin) -> (a:as, b `min` bmin)) ([], maxBound :: Int) ; (fo', !y) = unzipIdMinimum . ....`` *would* calculate the minimum, as it goes (right-to-left, bottom-up). it'll still unfurl the copy of the whole tree structure as the tuples at once, first, most probably, and then re-traverse it from the top when asked for the result tree, rebuilding the nodes top-down. (contd.) — Will Ness, Sep 08 '20 at 08:36
(2/2) an implementation would have to be *extremely "smart"* to not do the tuples and -- thus -- create the result tree directly from the top, as it traverses the input tree. but conceptually that *is* what the code describes. of course with direct recursion as in your answer that burden is lifted from an implementation. — Will Ness, Sep 08 '20 at 08:38
@WillNess why do we need a bang-pattern for `y` while `bmin` is being forced each iteration? It would probably have just one `min` application (to the result and the `maxBound :: Int`, that is). Or am I missing something? — Zhiltsoff Igor, Sep 08 '20 at 18:17
@ZhiltsoffIgor for one last `min` application to also be forced, was my intention. and it's not the last one either, there's one more ``x `min` y`` there. so I was just being thorough. In practice two thunks is nothing of course. — Will Ness, Sep 08 '20 at 18:55

score 2 · Answer 1 · answered Sep 07 '20 at 18:01

2

This is more an extended comment than an answer, but I don't really think of your implementation as single-pass. It looks like it traverses the tree once, producing a new, lazily-generated, tree and the global minimum, but it actually produces a lazily generated tree and an enormous tree of thunks that will eventually calculate the minimum. To avoid this, you can get closer to the Python code by generating the tree eagerly, keeping track of the minimum as you go.

You'll note that I've generalized the type from Int to an arbitrary Ord type. You'll also note that I've used to different type variables to refer to the type of elements in the given tree and the type of the minimum passed in to generate a new tree—this lets the type system tell me if I mix them up.

repmin :: Tree a -> Tree a
repmin = (loop . uncurry) copyRose

copyRose :: Ord a => Tree a -> b -> (Tree b, a)
copyRose (Node x ts) final_min
  | (ts', m) <- copyForest x ts final_min
  = (Node final_min ts', m)

copyForest :: Ord a => a -> [Tree a] -> b -> ([Tree b], a)
copyForest !m [] _final_min = ([], m)
copyForest !m (t : ts) final_min
  | (t', m') <- copyTree m t final_min
  , (ts', m'') <- copyForest m' ts final_min
  = (t' : ts', m'')

copyTree :: Ord a => a -> Tree a -> b -> (Tree b, a)
copyTree !m (Node x ts) final_min
  | (ts', m') <- copyForest (min m x) ts final_min
 = (Node final_min ts', m')

Exercise: rewrite this in monadic style using ReaderT to pass the global minimum and State to keep track of the minimum so far.

answered Sep 07 '20 at 18:01

dfeuer

44,398
3
56
155

Thank you for your answer. I was trying to match my solution with the solution in [Levent Erkok's "Value Recursion in Monadic Computations"](https://leventerkok.github.io/papers/erkok-thesis.pdf) (page 122, page 130 of the pdf - 9.1 "The `repmin` problem"). Does it have the same issue? – Zhiltsoff Igor Sep 07 '20 at 18:13
@ZhiltsoffIgor, yes, it does. I just realized calculating the minimum the way I did isn't necessary; you actually could do something like your approach making everything strict. But my way deals with the empty list challenge more gracefully. – dfeuer Sep 07 '20 at 18:24
Besides, what does it mean "a tree of thunks"? From my perspective, the recursive call in each node gathers the thunks from all of its children and proceeds. – Zhiltsoff Igor Sep 07 '20 at 18:25
would `(join seq) (min b bmin)` I wrote about above do for making it strict? I suppose we do not need `let` as the argument for `join f` is being shared. – Zhiltsoff Igor Sep 07 '20 at 18:37
@ZhiltsoffIgor, `join seq = id`, so no. – dfeuer Sep 07 '20 at 18:50
Can you, please, explain why `join seq` does not force its argument? I am not quite handy with things like sharing - sorry. My intuition was that `join seq` creates an entity, which is being shared around when it is forced. Does it not share the way I think, or does sharing help us at all? In the first case, would `let` help us? – Zhiltsoff Igor Sep 07 '20 at 19:00
`join seq = \x -> seq x x = \x -> x = id`. Absolutely not useful. – dfeuer Sep 07 '20 at 21:48
If you write `foo = join seq` and compile with `-O`, you will find that it compiles to the GHC core equivalent of `foo = \x -> x`. So there's no need to trust me on this. – dfeuer Sep 08 '20 at 00:23
@ZhiltsoffIgor, what I meant by a tree of thunks is a tree in memory consisting of thunks. Forcing the root node will traverse the tree, forcing all the rest, as it calculates a single result. – dfeuer Sep 08 '20 at 02:40
@deuer I am not arguing that my solution is correct, I am genuinely wondering why it is wrong :). I will ask it as a separate question, if you do not mind. Besides, would a bang-pattern `let !bmin' = min b bmin` do? – Zhiltsoff Igor Sep 08 '20 at 08:32
1

@ZhiltsoffIgor `seq x y` means, "when `y` is forced, force `x` also". thus `seq x x` says nothing. that's my understanding of it anyway. – Will Ness Sep 08 '20 at 09:26
@WillNess yes, I have heard of this intuition on `seq`, yet I never understood it too well. So, basically, when we try to run `y` to **WHNF** from `seq x y`, the compiler switches to running `x` and just then proceeds with `y`, is this what it means? That is it does not run `x` until we try to run `y`. – Zhiltsoff Igor Sep 08 '20 at 09:38
@ZhiltsoffIgor all I got from here and there is that it's a strictness annotation that says, when `y` is needed make a note that `x` is needed as well. it's a "demand contagion" or something. (not really sure either) I don't know the specifics. – Will Ness Sep 08 '20 at 10:38
@WillNess, ``x `seq` y`` is defined like this: If `x` is bottom, then the result is bottom. Otherwise, the result is `y`. In practice, it means that `x` will be forced before returning the value of `y`. – dfeuer Sep 08 '20 at 17:02
@dfeuer yes, but then one sees it stated emphatically all the time that there's no guarantee about the order of evaluation... which *is* the only reason I'd be using `seq` in the first place... I don't care how it's defined, I want `x` calculated before `y` to enforce strictness and avoid thunk buildup. and officially, I can't rely on `seq` for that, can I? that's the meaning of "in practice", isn't it? I hope at least bang patterns give me better guarantees. – Will Ness Sep 08 '20 at 18:54
re your code, I have a feeling we might need ``= let {p=t' : ts'} in p `seq` (p, m'')`` in `copyForest` to force that `:` on the way forward. maybe even add bangs to the `t'` and `ts'` in the pattern guards. – Will Ness Sep 08 '20 at 19:26
@WillNess, bang patterns are exactly the same. If you really, really want to control the exact order of evaluation, the basic tools are `lazy`, which is used to define `pseq`, and `seq#`, which is used to define `evaluate`. But these things gum up the optimization machinery. You're generally unlikely to actually need them unless concurrency or unsafe I/O are in play. – dfeuer Sep 09 '20 at 00:04
Oh yes, and you may also need those specials if you care *which* of several potential bottom values you get. – dfeuer Sep 09 '20 at 04:22

How does repmin place values in the tree in Haskell?

1 Answers1