Saturday, 23 August 2008

Tail Call Optimization doesn't exist in Haskell


It's well known that since Haskell programs are evaluated lazily, the
considerations for writing recursive code are different to those of a strict
language.

> cat (zero,plus) [] = zero
> cat (zero,plus) (x:xs) = x `plus` cat (zero,plus) xs

> cat' (zero,plus) [] acc = acc
> cat' (zero,plus) (x:xs) acc = cat' (zero,plus) xs (acc `plus` x)

> loads = [1..1000000]

*Main> cat (0,(+)) loads
*** Exception: stack overflow
*Main> cat' (0,(+)) loads 0
*** Exception: stack overflow


Since normal numbers in haskell are strict, I'm going to use lazy numbers here
(The hope is that any results should apply to lazy structures in general, and
avoid having to take strictness into account).

> data N = Z | S N deriving Show

> zero = Z
> plus (S x) y = S (plus x y)
> plus Z y = y

> lots = take 1000000 . iterate S $ Z

*Main> cat (zero,plus) lots
S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S^C(S (S (S (S Interrupted.
*Main> cat' (zero,plus) lots Z
*** Exception: stack overflow


Another example is an infinite loop:

*Main> let f x = f x in f []
^CInterrupted.
*Main> let f x = f (():x) in f []
^CInterrupted.

Both continue happily, the second takes up huge amounts of memory but it does
not stack overflow.

Haskell evaluation is like graph reduction, a program is a graph which you tug
on the end until you get a weak head normal form, in these cases you never get
one, but the program/graph is stored on the heap, in the second case that data
gets bigger and bigger (it might even get swapped to the hard disk after a
while).

That's why they didn't crash, and it had nothing to do with tail calls or tail
call optimization, cat (zero,plus) lots didn't do any tail call and that
didn't crash. The reason any stack overflow occurs is that the program which
reduces the graph (on the heap) has a stack, that program in my case GHC is
written in C (the RTS) and when it delves deep into an expression such as
1 + (2 + (3 + (4 + ...))) that is what builds up stack.



So why does something like:

> spam = putStr "SPAM " >> spam

never crash?

Since (>>) is in tail position (spam is not a tail call), again, tail calls have
nothing to do with this. m >> n in this case means do m throw away the result
then do n, it must get thrown to the garbage collector (unlike the f (():x)
example where the value can't be collected).



That's all fine, but we still haven't been able to sum [1..1000000] together.
So to recap, building up a huge expression on the heap is fine (as in the
cat (zero,plus) lots example) but walking deep into an expression without
getting any constructors is dangerous.

> kat' (zero,plus) [] acc = acc
> kat' (zero,plus) (x:xs) acc = acc'plus'x `seq` kat' (zero,plus) xs acc'plus'x
> where acc'plus'x = acc `plus` x

*Main> cat' (0,(+)) loads 0
*** Exception: stack overflow
*Main> kat' (0,(+)) loads 0
500000500000


> -- And a huge thanks to everyone that I talked about this with
> -- for bending my mind :)

7 comments:

wren said...

Haskell very much does have tail call elimination, any claims to the contrary are demonstrably false. Your stack overflow issues have nothing to do with tail call elimination. You've said this yourself, though you seem not to have followed the reasoning to conclusion.

This is just a canonical issue with accumulator functions in lazy languages. All of your tail calls to cat' as you construct that accumulator are eliminated perfectly fine. The problem is that you end up with a million element thunk at the end. When you pull on that thunk there are no tail calls to eliminate.

The expression for your accumulator is: (((((((...(0 + 1) + 2) + 3)... + 1000000). (N.B. it is not (0 + (1 +...)))!) The ultimate call to seq acc will tail call the topmost (+), reusing the stack frame used by seq. But from there, since (+) is strict in its arguments it must evaluate the left hand expression before it can return. This is repeated a million times as you descend. But none of those are tail calls and so the problem has nothing whatsoever to do with GHC's eliminating tail calls.

You repeat the same problem with your Peano integers since your definition of plus requires evaluating the left-hand argument to WHNF and the left-hand argument is the one with a million-1 depth call stack. Try having your definition count down on the right operand instead. Now, cat' works perfectly fine! Of course cat will stack overflow, because now it has the same problem as cat' did with the old definition of plus; you've just exchanged which definition works in synch with the pattern of thunks.

The well known solution to this canonical interaction between accumulators and laziness is to strictly evaluate the accumulator at each step. What this does is amortize the call stack for evaluating the accumulator across all the recursive calls. The strictness call can't be eliminated because its result is needed for the recursive call, but the recursive call can be eliminated because the result of the callee is the same as the result of the caller. Thus you have a call stack depth of at most two (or rather, one plus whatever depth is needed to bring the accumulator back to WHNF).

Your infinite loops chirp away happily precisely because GHC does tail call elimination. If it did not then those functions would cause a stack overflow. Try writing those exact functions in Java and watch them explode. (Alas C is no longer a good example since the GHC folks cracked GCC open and added in TCO for everyone.) "Tail call elimination" means only that the current stack frame can be reused if the last action of the current function is to call another function. Because GHC does indeed do tail call elimination the frame for the first call to f can be reused when making the second call to f, and that frame can be reused when making the third call to f, etc. In a language without TCO each one of those calls would require an additional stack frame until you overflow.

Muad`Dib said...

wren: Almost everything you said is accurate and correct, It also happens mostly you just repeated what I wrote. Anyway, I've written the exact (let f x = f x in f 3) function in java, along with a lazy evaluator (one which doesn't detect or optimize tail calls), and it doesn't stack overflow, (I've posted it on this site incase you like to see it).

wren said...
This comment has been removed by the author.
wren said...

[0] wren@xenobia:~/test $ cat NoTCO.java ; javac NoTCO.java ; java NoTCOclass NoTCO {
// The int is a lie
static int f(int x) { int y = f(x); return y; }

public static void main(String[] args) {
System.out.println( NoTCO.f(3));
}
}
Exception in thread "main" java.lang.StackOverflowError
at NoTCO.f(NoTCO.java:3)
at NoTCO.f(NoTCO.java:3)
[...repeat 1022 more times]
[0] wren@xenobia:~/test $

dibblego said...

wren nailed it. It seems you don't understand why, so here is a simple proof.

Haskell has Data.List.foldl' (and foldl' runs a tail call in constant stack) -QED

Have a nice day.

Muad`Dib said...

dibblego: You are jumping to a false conclusion, I hope my follow up post should explain my message clearer.

Anonymous said...

This is a nice one, thanks. Although i did have to read up using a tail call optimization example before i read your article, it was still very informative!