## Sunday, 26 April 2009

### Strongly Specified Parser Combinators

Following the approach Wouter Swierstra used for Hoare State Monad, we define a Parser monad with pre and post conditions that express soundness (but not completeness) of the parser.

The computational aspect of it is the same old parser monad, type Parser s a = [s] -> [(s, [a])], which takes a list of tokens [s] to a (possibly empty) list (here is where backtracking/nondeterminism comes into it) of parses paired with the rest of the text. It's a monad and also a monad plus.

On the specification end, we can put a precondition on the input string (for example, you might say the input length is <= some value, for ensuring termination of a recursive parser) and also a post condition comparing the input with the parsed value and the remaining output.

`  Definition Pre := list s -> Prop.  Definition Post (t : Set) := list s -> t -> list s -> Prop.    Program Definition Parser (pre : Pre) (t : Set) (post : Post t) : Set :=      forall i : { t : list s | pre t }, list ({ (x, r) : t * list s | post i x r }).`

in Haskell we might define:

`m >>= f = \i -> concatMap (uncurry f) (m i)`

And in Coq, using Program we have roughly the same thing. Except that one has to apply a 'noncomputational_map' to fudge the proofs paired up with the list elements.

`  Program Definition Bind (a b : Set) P1 P2 Q1 Q2    (m : Parser P1 a Q1)    (f : (forall x : a, Parser (P2 x) b (Q2 x))) :    Parser (fun i => P1 i /\ forall x o, Q1 i x o -> P2 x o)           b           (fun i x' o' => exists x o, Q1 i x o /\ Q2 x o x' o') :=    fun i => @flat_map ({ (x, o) : a * list s | Q1 i x o }) _      (fun xo => match xo with (x,o) => noncomputational_map _ _ _ _ (f x o) end)      (m i).`

Seeing as noncomputational map doesn't do anything, we prove a theorem expressing that as justification for extracting it out as the identity function (rather it being equivalent to map id traversing the whole list).

`  Theorem noncomputational_map_identity :    forall l,      map (@proj1_sig _ _) l = map (@proj1_sig _ _) (noncomputational_map l).    Extract Inlined Constant noncomputational_map => "id".`

Another nice parser combinator is the fixed point of a parser:

`  Program Definition Fix t {P Q} :    (forall i : { t : list s | P t },      Parser (fun i' => length i' < length i /\ P i') t Q ->      list ({ (x, o) : t * list s | Q i x o })) ->    Parser P t Q :=    fun Rec =>      well_founded_induction        (well_founded_ltof ({ i : list s | P i }) (fun i => length i))        (fun i => list ({ (x, o) : t * list s | Q i x o }))        (fun x Rec' => Rec x (fun i => Rec' i _)).`

It packages up well founded recursion on the size of the input string, so that any non-left recursive parsers should be easily defined.

Enough of the heavy machinary! A simple example of putting thing into work now:

` <par> ::= <epsilon> | '(' <par> ')' <par>`

To parse this we define a type of tokens and abstract syntax of parsing derivations - then relate them with a function:

`  Inductive token := open | close.  Inductive par := epsilon : par | wrappend : par -> par -> par.  Fixpoint print (p : par) : list token :=    match p with      | epsilon => nil      | wrappend m n => (open::nil) ++ print m ++ (close::nil) ++ print n    end.`

Now Program lets us define the par parser as the fixed point of the sum of epsilon and wrapped recursions:

`  Program Definition par_parser : Parser token (fun _ => True) par (fun x p y => x = print p ++ y /\ length y <= length x) :=    Fix token par (fun i parRec =>      Plus _ _ (fun i' => i = i') _        (fudge_pre_and_post_conditions _ _ _          (Return epsilon))        (fudge_pre_and_post_conditions _ _ _          (Symbol token eq_token_dec open >>= fun _ =>           parRec >>= fun m =>           Symbol token eq_token_dec close >>= fun _ =>           parRec >>= fun n =>           Return (wrappend m n)))      i).`

Here are the actual scripts http://github.com/odge/parseq/tree/master