35

So, I've spent a lot of time reading and re-reading the ending of chapter 9 in The Little Schemer, where the applicative Y combinator is developed for the length function. I think my confusion boils down to a single statement that contrasts two versions of length (before the combinator is factored out):

A:
  ((lambda (mk-length)
     (mk-length mk-length))
   (lambda (mk-length)
     (lambda (l)
       (cond
         ((null? l) 0 )
         (else (add1
                ((mk-length mk-length)
                 (cdr l))))))))

B:
((lambda (mk-length)
      (mk-length mk-length))
    (lambda (mk-length)
      ((lambda (length)
         (lambda (l)
           (cond
             ((null? l) 0)
             (else (add1 (length (cdr l)))))))
       (mk-length mk-length))))

Page 170 (4th ed.) states that A

returns a function when we applied it to an argument

while B

does not return a function

thereby producing an infinite regress of self-applications. I'm stumped by this. If B is plagued by this problem, I don't see how A avoids it.

planarian
  • 1,877
  • 16
  • 17

2 Answers2

39

Great question. For the benefit of those without a functioning DrRacket installation (myself included) I'll try to answer it.

First, let's use some sane (short) variable names, easily trackable by a human eye/mind:

((lambda (h)     ; A.   
     (h h))            ; apply h to h
 (lambda (g)
     (lambda (lst)
       (if (null? lst) 0
         (add1 
               ((g g) (cdr lst)))))))

The first lambda term is what's known as little omega, or U combinator. When applied to something, it causes that term's self-application. Thus the above is equivalent to

(let ((h (lambda (g)
           (lambda (lst)
             (if (null? lst) 0
               (add1 ((g g) (cdr lst))))))))
  (h h))

When h is applied to h, new binding is formed:

(let ((h (lambda (g)
           (lambda (lst)
             (if (null? lst) 0
               (add1 ((g g) (cdr lst))))))))
  (let ((g h))
    (lambda (lst)
            (if (null? lst) 0
              (add1 ((g g) (cdr lst)))))))

Now there's nothing to apply anymore, so the inner lambda form is returned — along with the hidden linkages to the environment frames (i.e. those let bindings) up above it.

This pairing of a lambda expression with its defining environment is known as closure. To the outside world it is just another function of one parameter, lst. No more reduction steps left to perform there at the moment.

Now, when that closure — our list-length function — will be called, execution will eventually get to the point of (g g) self-application, and the same reduction steps as outlined above will again be performed. But not earlier.


Now, the authors of that book want to arrive at the Y combinator, so they apply some code transformations to the first expression, to somehow arrange for that self-application (g g) to be performed automatically — so we may write the recursive function application in the normal manner, (f x), instead of having to write it as ((g g) x) for all recursive calls:

((lambda (h)     ; B.
     (h h))            ; apply h to h
 (lambda (g)
   ((lambda (f)           ; 'f' to become bound to '(g g)',
      (lambda (lst) 
        (if (null? lst) 0
          (add1 (f (cdr lst))))))  ; here: (f x) instead of ((g g) x)!
    (g g))))                       ; (this is not quite right)

Now after few reduction steps we arrive at

(let ((h (lambda (g)
           ((lambda (f)    
              (lambda (lst) 
                (if (null? lst) 0
                  (add1 (f (cdr lst))))))
            (g g)))))
  (let ((g h))
    ((lambda (f)
       (lambda (lst)
            (if (null? lst) 0
              (add1 (f (cdr lst))))))
     (g g))))

which is equivalent to

(let ((h (lambda (g)
           ((lambda (f)    
              (lambda (lst) 
                (if (null? lst) 0
                  (add1 (f (cdr lst))))))
            (g g)))))
  (let ((g h))
    (let ((f (g g)))           ; problem! (under applicative-order evaluation)
       (lambda (lst)
            (if (null? lst) 0
              (add1 (f (cdr lst))))))))

And here comes trouble: the self-application of (g g) is performed too early, before that inner lambda can be even returned, as a closure, to the run-time system. We only want it be reduced when the execution got to that point inside the lambda expression, after the closure was called. To have it reduced before the closure is even created is ridiculous. A subtle error. :)

Of course, since g is bound to h, (g g) is reduced to (h h) and we're back again where we started, applying h to h. Looping.


Of course the authors are aware of this. They want us to understand it too.

So the culprit is simple — it is the applicative order of evaluation: evaluating the argument before the binding is formed of the function's formal parameter and its argument's value.

That code transformation wasn't quite right, then. It would've worked under normal order where arguments aren't evaluated in advance.

This is remedied easily enough by "eta-expansion", which delays the application until the actual call point: (lambda (x) ((g g) x)) actually says: "will call ((g g) x) when called upon with an argument of x".

And this is actually what that code transformation should have been in the first place:

((lambda (h)     ; C.
     (h h))            ; apply h to h
 (lambda (g)
   ((lambda (f)           ; 'f' to become bound to '(lambda (x) ((g g) x))',
      (lambda (lst) 
        (if (null? lst) 0
          (add1 (f (cdr lst))))))  ; here: (f x) instead of ((g g) x)
    (lambda (x) ((g g) x)))))

Now that next reduction step can be performed:

(let ((h (lambda (g)
           ((lambda (f)    
              (lambda (lst) 
                (if (null? lst) 0
                  (add1 (f (cdr lst))))))
            (lambda (x) ((g g) x))))))
  (let ((g h))
    (let ((f (lambda (x) ((g g) x))))       ; here it's OK
      (lambda (lst)
            (if (null? lst) 0
              (add1 (f (cdr lst))))))))

and the closure (lambda (lst) ...) is formed and returned without a problem, and when (f (cdr lst)) is called (inside the closure) it is reduced to ((g g) (cdr lst)) just as we wanted it to be.


Lastly, we notice that (lambda (f) (lambda (lst ...)) expression in C. doesn't depend on any of the h and g. So we can take it out, make it an argument, and be left with ... the Y combinator:

( ( (lambda (rec)            ; D.
      ( (lambda (h) (h h))  
        (lambda (g)
          (rec  (lambda (x) ((g g) x))))))   ; applicative-order Y combinator
    (lambda (f)
        (lambda (lst) 
          (if (null? lst) 0
            (add1 (f (cdr lst)))))) )
  (list 1 2 3) )                            ; ==> 3

So now, calling Y on a function is equivalent to making a recursive definition out of it:

( y (lambda (f) (lambda (x) .... (f x) .... )) ) 
===  define f = (lambda (x) .... (f x) .... )

... but using letrec (or named let) is better — more efficient, defining the closure in self-referential environment frame. The whole Y thing is a theoretical exercise for the systems where that is not possible — i.e. where it is not possible to name things, to create bindings with names "pointing" to things, referring to things.

Incidentally, the ability to point to things is what distinguishes the higher primates from the rest of the animal kingdom ⁄ living creatures, or so I hear. :)

Will Ness
  • 62,652
  • 8
  • 86
  • 167
  • 1
    thanks!! :) I think about adding the final step of deriving the Y itself ... it's the next logical thing to do. I remember myself mystified by the whole Y thing/mystery. Needlessly so. Too often it is presented ex-machina. There are all kinds of metaphorical descriptions, but not the actual derivation. I like to see the justification, and then the derivation. In *small steps*. :) – Will Ness Aug 08 '12 at 16:51
  • Thanks for this explanation. I was stuck on this very part, and I had a feeling that the first `lambda` expression mentioned above was also equivalent to the `let` form, but I wasn't entirely sure until reading this - let alone that it was called the "omega combinator". This information would have been helpful. I think I will still have to spend some time tracing the output of the Y-combinator, but it feels much less muddy than (in my opinion) the authors' rather lackluster explanation of this concept. – dtg Sep 17 '12 at 08:31
  • 1
    welcome. :) the *"little"* explanation as is usual in the *"Little"* books ... :) They leave it to readers to form explanations for themselves; I personally don't know if that's such a great idea; or else there wouldn't be universities and textbooks today... we're supposed to stand on the giant's *shoulders*, not in their *footsteps*... also, it's [the *little* omega](http://en.wikipedia.org/wiki/Lambda_calculus#Standard_terms): " ω := λx.x x ; Ω := ω ω " and http://en.wikipedia.org/wiki/Omega calls the `Ω` an "omega combinator" so that might have not been exactly right. – Will Ness Sep 17 '12 at 09:05
  • Okay. Thanks for the clarification. One more question - is the second `lambda` in (A) also an example of a *little* omega-combinator? I.e. does `(lambda (g) (rest-of-function))` resemble the first `(lambda (h) (h h))`. I am trying to understand how calling `(g g)` results in recursion on the second `lambda` expression... – dtg Sep 17 '12 at 09:56
  • little-omega is just a name commonly used to refer to `(λx.x x)` term. The second `lambda` in `A` is certainly something else - it is the definition of the recursive procedure written in a certain way to be "jump-started" by self-application. Since this `(lambda(g) ...)` is applied to itself, the variable binding is formed where `g` refers to the whole lambda term `(lambda(g) ...)`. The later call `(g g)` is thus the same self-application, with the same results. – Will Ness Sep 17 '12 at 10:06
  • @Dylan Turns out, little-omega is a.k.a. [U-combinator](http://stackoverflow.com/q/16258308/849891). – Will Ness May 02 '13 at 08:55
  • @WillNess I thought the U combinator was [`(λx.λy.y (x x y))`](http://en.wikipedia.org/wiki/Lambda_calculus#Standard_terms). – Aadit M Shah Mar 30 '19 at 05:15
  • @AaditMShah I've noticed that myself yesterday, on the Wikipedia LC page. It used to be different, or this is what I remember anyway. Also, there's [U combinator group](http://www.ucombinator.org/) in Utah university CS department - which I hit by googling some years back - that uses U that way, so I too been using it that way since. So now I don't know what's what. Need to dig through primary sources maybe. – Will Ness Mar 30 '19 at 06:43
  • @AaditMShah I've reverted that edit on WP LC page. It gave no sources, but the `\x.xx` form appears on the front page of a respectable research institution. – Will Ness Apr 02 '19 at 11:15
24

To see what happens, use the stepper in DrRacket. The stepper allows you to see all intermediary steps (and to go back and forth).

Paste the following into DrRacket:

(((lambda (mk-length)
    (mk-length mk-length))
  (lambda (mk-length)
    (lambda (l)
      (cond
        ((null? l) 0 )
        (else (add1
               ((mk-length mk-length)
                (cdr l))))))))
 '(a b c))

Then choose the teaching language "Intermediate Student with lambda". Then click the stepper button (the green triangle followed by a bar).

This is what the first step looks like:

enter image description here

Then make an example for the second function and see what goes wrong.

soegaard
  • 28,660
  • 4
  • 50
  • 97