Muad`Dib: Strongly Specified Parser Combinators

Following the approach Wouter Swierstra used for Hoare State Monad, we define a Parser monad with pre and post conditions that express soundness (but not completeness) of the parser.

The computational aspect of it is the same old parser monad, type Parser s a = [s] -> [(s, [a])], which takes a list of tokens [s] to a (possibly empty) list (here is where backtracking/nondeterminism comes into it) of parses paired with the rest of the text. It's a monad and also a monad plus.

On the specification end, we can put a precondition on the input string (for example, you might say the input length is <= some value, for ensuring termination of a recursive parser) and also a post condition comparing the input with the parsed value and the remaining output.


  Definition Pre := list s -> Prop.
  Definition Post (t : Set) := list s -> t -> list s -> Prop.
  
  Program Definition Parser (pre : Pre) (t : Set) (post : Post t) : Set :=
      forall i : { t : list s | pre t }, list ({ (x, r) : t * list s | post i x r }).

in Haskell we might define:

m >>= f = \i -> concatMap (uncurry f) (m i)

And in Coq, using Program we have roughly the same thing. Except that one has to apply a 'noncomputational_map' to fudge the proofs paired up with the list elements.


  Program Definition Bind (a b : Set) P1 P2 Q1 Q2
    (m : Parser P1 a Q1)
    (f : (forall x : a, Parser (P2 x) b (Q2 x))) :
    Parser (fun i => P1 i /\ forall x o, Q1 i x o -> P2 x o)
           b
           (fun i x' o' => exists x o, Q1 i x o /\ Q2 x o x' o') :=
    fun i => @flat_map ({ (x, o) : a * list s | Q1 i x o }) _
      (fun xo => match xo with (x,o) => noncomputational_map _ _ _ _ (f x o) end)
      (m i).

Seeing as noncomputational map doesn't do anything, we prove a theorem expressing that as justification for extracting it out as the identity function (rather it being equivalent to map id traversing the whole list).


  Theorem noncomputational_map_identity :
    forall l,
      map (@proj1_sig _ _) l = map (@proj1_sig _ _) (noncomputational_map l).
  
  Extract Inlined Constant noncomputational_map => "id".

Another nice parser combinator is the fixed point of a parser:


  Program Definition Fix t {P Q} :
    (forall i : { t : list s | P t },
      Parser (fun i' => length i' < length i /\ P i') t Q ->
      list ({ (x, o) : t * list s | Q i x o })) ->
    Parser P t Q :=
    fun Rec =>
      well_founded_induction
        (well_founded_ltof ({ i : list s | P i }) (fun i => length i))
        (fun i => list ({ (x, o) : t * list s | Q i x o }))
        (fun x Rec' => Rec x (fun i => Rec' i _)).

It packages up well founded recursion on the size of the input string, so that any non-left recursive parsers should be easily defined.

Enough of the heavy machinary! A simple example of putting thing into work now:

 <par> ::= <epsilon> | '(' <par> ')' <par>

To parse this we define a type of tokens and abstract syntax of parsing derivations - then relate them with a function:


  Inductive token := open | close.
  Inductive par := epsilon : par | wrappend : par -> par -> par.

  Fixpoint print (p : par) : list token :=
    match p with
      | epsilon => nil
      | wrappend m n => (open::nil) ++ print m ++ (close::nil) ++ print n
    end.

Now Program lets us define the par parser as the fixed point of the sum of epsilon and wrapped recursions:


  Program Definition par_parser : Parser token (fun _ => True) par (fun x p y => x = print p ++ y /\ length y <= length x) :=
    Fix token par (fun i parRec =>
      Plus _ _ (fun i' => i = i') _
        (fudge_pre_and_post_conditions _ _ _
          (Return epsilon))
        (fudge_pre_and_post_conditions _ _ _
          (Symbol token eq_token_dec open >>= fun _ =>
           parRec >>= fun m =>
           Symbol token eq_token_dec close >>= fun _ =>
           parRec >>= fun n =>
           Return (wrappend m n)))
      i).

Here are the actual scripts http://github.com/odge/parseq/tree/master

Muad`Dib

Blog Archive

About Me

Sunday, 26 April 2009

Strongly Specified Parser Combinators

No comments:

Labels