Haskell and tail recursion don't get along as well as other functional languages and tail recursion. Let's do some manual reduction on some very simple code to see what's going on with tail recursion. Here's a tail-recursive implementation of map (1+)
.
go [] r = r
go (x:xs) r = go xs (r ++ [1+x])
Also we must keep in mind the definition of (++)
:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
Now let's reduce go [1,2,3,4,5] []
. Keep in mind that [x,y,z]
is notation for x:(y:(z:[]))
, or for short x:y:z:[]
.
go [1,2,3,4,5] []
go [2,3,4,5] ([] ++ [2]) -- 2 here is actually the thunk 1+1, but
-- for compactness I am reducing earlier
go [3,4,5] (([] ++ [2]) ++ [3])
go [4,5] ((([] ++ [2]) ++ [3]) ++ [4])
go [5] (((([] ++ [2]) ++ [3]) ++ [4]) ++ [5])
go [] ((((([] ++ [2]) ++ [3]) ++ [4]) ++ [5]) ++ [6])
(((([] ++ [2]) ++ [3]) ++ [4]) ++ [5]) ++ [6]
((([2] ++ [3]) ++ [4]) ++ [5]) ++ [6]
(((2:([] ++ [3]) ++ [4]) ++ [5]) ++ [6]
((2:(([] ++ [3]) ++ [4]) ++ [5]) ++ [6]
(2:((([] ++ [3]) ++ [4]) ++ [5]) ++ [6]
2:(((([] ++ [3]) ++ [4]) ++ [5]) ++ [6]) -- first observable output
2:((([3] ++ [4]) ++ [5]) ++ [6])
2:((3:([] ++ [4]) ++ [5]) ++ [6])
2:(3:(([] ++ [4]) ++ [5]) ++ [6])
2:3:((([] ++ [4]) ++ [5]) ++ [6]) -- second observable output
2:3:(([4] ++ [5]) ++ [6])
2:3:(4:([] ++ [5]) ++ [6])
2:3:4:(([] ++ [5]) ++ [6]) -- third observable output
2:3:4:([5] ++ [6])
2:3:4:5:([] ++ [6]) -- fourth observable output
2:3:4:5:6:[] -- final output
See how each item in the output needs to work its way outward from a deeply nested series of parentheses? This causes it to take quadratic time in the size of the input to get all the output. You'll also see a behavior that the first few items are yielded slowly, and it gets faster and faster as you reach the end of the list. This reduction explains that.
The main performance problem here is appending the new element to the end of the list, which takes time proportional to the size of the list you are appending to. A better way is to cons on the front, which is a constant-time operation. This will cause the output to come out reversed, so you need to reverse the result.
go [] r = reverse r
go (x:xs) r = go xs ((1+x):r)
reverse xs = rev xs [] -- roughly from the report prelude
rev [] r = r
rev (x:xs) r = rev xs (x:r)
And, let's reduce:
go [1,2,3,4,5] []
go [2,3,4,5] [2]
go [3,4,5] [3,2]
go [4,5] [4,3,2]
go [5] [5,4,3,2]
go [] [6,5,4,3,2]
reverse [6,5,4,3,2]
rev [6,5,4,3,2] []
rev [5,4,3,2] [6]
rev [4,3,2] [5,6]
rev [3,2] [4,5,6]
rev [2] [3,4,5,6]
rev [] [2,3,4,5,6]
[2,3,4,5,6] -- first and all observable output!
So this is clearly less work than the first version. This is the style is used in strict languages like Scheme and ML, because it has good memory performance. However, it has some disadvantages:
- All the input must be consumed before any output can be produced. Indeed the entire computation is performed before anything results are produced.
- Because of this, it does not ever yield any output when given an infinite list.
- It involves
reverse
, which is takes an extra O(n)
time and has nothing to do with what we are doing (what does reversing have to do with adding one to every element and preserving the order?).
In a lazy language like Haskell, we can do better. Strangely, and beautifully, the way we do is by writing it even more naively.
go [] = []
go (x:xs) = (1+x):go xs
and reduce:
go [1,2,3,4,5]
2:(go [2,3,4,5]) -- first observable output
2:3:(go [3,4,5]) -- second observable output
2:3:4:(go [4,5]) -- third observable output
2:3:4:5:(go [6]) -- fourth observable output
2:3:4:5:6:(go []) -- fifth observable output
2:3:4:5:6:[] -- final output
It takes even less work, and it starts yielding output before even looking at the remainder of the list, so it has good performance in a stream computation and works on infinite inputs. And the implementation is about as simple and obvious as you could hope for.
I hope this gives you some intuition for how tail recursion works in Haskell. For your example, I suggest removing the tail recursion and rewriting in a naive style similar to our final go
, using the intuition I hope I suggested from this post to yield "as large a prefix of the input as possible, as soon as possible" (notice that returning x:xs
immediately yields x
, even if there is some more work to be done to compute xs
-- that's laziness in (non-)action).