It's well known that since Haskell programs are evaluated lazily, the

considerations for writing recursive code are different to those of a strict

language.

> cat (zero,plus) [] = zero

> cat (zero,plus) (x:xs) = x `plus` cat (zero,plus) xs

> cat' (zero,plus) [] acc = acc

> cat' (zero,plus) (x:xs) acc = cat' (zero,plus) xs (acc `plus` x)

> loads = [1..1000000]

*Main> cat (0,(+)) loads

*** Exception: stack overflow

*Main> cat' (0,(+)) loads 0

*** Exception: stack overflow

Since normal numbers in haskell are strict, I'm going to use lazy numbers here

(The hope is that any results should apply to lazy structures in general, and

avoid having to take strictness into account).

> data N = Z | S N deriving Show

> zero = Z

> plus (S x) y = S (plus x y)

> plus Z y = y

> lots = take 1000000 . iterate S $ Z

*Main> cat (zero,plus) lots

S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S^C(S (S (S (S Interrupted.

*Main> cat' (zero,plus) lots Z

*** Exception: stack overflow

Another example is an infinite loop:

*Main> let f x = f x in f []

^CInterrupted.

*Main> let f x = f (():x) in f []

^CInterrupted.

Both continue happily, the second takes up huge amounts of memory but it does

not stack overflow.

Haskell evaluation is like graph reduction, a program is a graph which you tug

on the end until you get a weak head normal form, in these cases you never get

one, but the program/graph is stored on the heap, in the second case that data

gets bigger and bigger (it might even get swapped to the hard disk after a

while).

That's why they didn't crash, and it had nothing to do with tail calls or tail

call optimization, cat (zero,plus) lots didn't do any tail call and that

didn't crash. The reason any stack overflow occurs is that the program which

reduces the graph (on the heap) has a stack, that program in my case GHC is

written in C (the RTS) and when it delves deep into an expression such as

1 + (2 + (3 + (4 + ...))) that is what builds up stack.

So why does something like:

> spam = putStr "SPAM " >> spam

never crash?

Since (>>) is in tail position (spam is not a tail call), again, tail calls have

nothing to do with this. m >> n in this case means do m throw away the result

then do n, it must get thrown to the garbage collector (unlike the f (():x)

example where the value can't be collected).

That's all fine, but we still haven't been able to sum [1..1000000] together.

So to recap, building up a huge expression on the heap is fine (as in the

cat (zero,plus) lots example) but walking deep into an expression without

getting any constructors is dangerous.

> kat' (zero,plus) [] acc = acc

> kat' (zero,plus) (x:xs) acc = acc'plus'x `seq` kat' (zero,plus) xs acc'plus'x

> where acc'plus'x = acc `plus` x

*Main> cat' (0,(+)) loads 0

*** Exception: stack overflow

*Main> kat' (0,(+)) loads 0

500000500000

> -- And a huge thanks to everyone that I talked about this with

> -- for bending my mind :)

## Saturday, 23 August 2008

### Tail Call Optimization doesn't exist in Haskell

Subscribe to:
Post Comments (Atom)

## 7 comments:

Haskell very much does have tail call elimination, any claims to the contrary are demonstrably false. Your stack overflow issues have nothing to do with tail call elimination. You've said this yourself, though you seem not to have followed the reasoning to conclusion.

This is just a canonical issue with accumulator functions in lazy languages. All of your tail calls to

cat'as you construct that accumulator are eliminated perfectly fine. The problem is that you end up with a million element thunk at the end. When you pull on that thunk there are no tail calls to eliminate.The expression for your accumulator is:

(((((((...(0 + 1) + 2) + 3)... + 1000000). (N.B. it is not(0 + (1 +...)))!) The ultimate call toseq accwill tail call the topmost(+), reusing the stack frame used byseq. But from there, since(+)is strict in its arguments it must evaluate the left hand expression before it can return. This is repeated a million times as you descend. But none of those are tail calls and so the problem has nothing whatsoever to do with GHC's eliminating tail calls.You repeat the same problem with your Peano integers since your definition of

plusrequires evaluating the left-hand argument to WHNF and the left-hand argument is the one with a million-1 depth call stack. Try having your definition count down on the right operand instead. Now,cat'works perfectly fine! Of coursecatwill stack overflow, because now it has the same problem ascat'did with the old definition ofplus; you've just exchanged which definition works in synch with the pattern of thunks.The well known solution to this canonical interaction between accumulators and laziness is to strictly evaluate the accumulator at each step. What this does is amortize the call stack for evaluating the accumulator across all the recursive calls. The strictness call can't be eliminated because its result is needed for the recursive call, but the recursive call can be eliminated because the result of the callee is the same as the result of the caller. Thus you have a call stack depth of at most two (or rather, one plus whatever depth is needed to bring the accumulator back to WHNF).

Your infinite loops chirp away happily

precisely becauseGHC does tail call elimination. If it did not then those functions would cause a stack overflow. Try writing those exact functions in Java and watch them explode. (Alas C is no longer a good example since the GHC folks cracked GCC open and added in TCO for everyone.) "Tail call elimination" means only that the current stack frame can be reused if the last action of the current function is to call another function. Because GHC does indeed do tail call elimination the frame for the first call tofcan be reused when making the second call tof, and that frame can be reused when making the third call tof, etc. In a language without TCO each one of those calls would require an additional stack frame until you overflow.wren: Almost everything you said is accurate and correct, It also happens mostly you just repeated what I wrote. Anyway, I've written the exact (let f x = f x in f 3) function in java, along with a lazy evaluator (one which doesn't detect or optimize tail calls), and it doesn't stack overflow, (I've posted it on this site incase you like to see it).

[0] wren@xenobia:~/test $ cat NoTCO.java ; javac NoTCO.java ; java NoTCOclass NoTCO {

// The int is a lie

static int f(int x) { int y = f(x); return y; }

public static void main(String[] args) {

System.out.println( NoTCO.f(3));

}

}

Exception in thread "main" java.lang.StackOverflowError

at NoTCO.f(NoTCO.java:3)

at NoTCO.f(NoTCO.java:3)

[...repeat 1022 more times]

[0] wren@xenobia:~/test $

wren nailed it. It seems you don't understand why, so here is a simple proof.

Haskell has Data.List.foldl' (and foldl' runs a tail call in constant stack) -QED

Have a nice day.

dibblego: You are jumping to a false conclusion, I hope my follow up post should explain my message clearer.

This is a nice one, thanks. Although i did have to read up using a tail call optimization example before i read your article, it was still very informative!

Post a Comment