Microproject
Begin with the code from the class example featuring a recursive descent parser in OCaml. That program implements a tokenizer and parser for the following grammar:
1 2 |
E -: A '+' E | A A -: 0|1|2|3|4|5|6|7|8|9 |
The parser, when provided with a list of typed tokens, builds an Abstract Syntax Tree (AST) featuring Sum and Num nodes. The Sum nodes are pairs of nodes which are to be added, and Num nodes have integer values. It also implements evaluation for ASTs generated by the parser. Modify the program to implement the following, modified grammar:
1 2 3 |
E -: T '+' E | T T -: A '*' T | A A -: [0-9]+ |
Note that the A value is provided in terms of a regular expression, allowing for 1 or more of 0-9. Be sure the tokenizer, parser, and evaluation components all work. You will need to modify the token and exp types, the tokenizer, the parser, and the evaluation.
When you submit your microproject, please include both your code and a screenshot of your code producing reasonable output.
Main Project
Write an OCaml program that performs pattern matching on strings, where patterns are expressed using only the concatenation, alternation (“|”), and optional (“?”) operators of regular expressions (no loops/”*”, no escape characters), and the tokens are letters and digits, plus period (“.”) to mean any letter. Each run of the program should accept a pattern, and then any number of strings, reporting only whether they match. Your program should represent expressions as trees and evaluate on the inputs, without using any regular expressions or OCaml’s regular expression libraries except for matching the individual alphanumeric characters, if you’d like. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
pattern? ((h|j)ell. worl?d)|(42) string? hello world match string? jello word match string? jelly word match string? 42 match string? 24 no match string? hello world42 no match |
1 2 3 4 5 6 7 8 9 10 11 |
pattern? I (like|love|hate)( (cat|dog))? people string? I like cat people match string? I love dog people match string? I hate people match string? I likelovehate people no match string? I people no match |
Bootstrap
An ambiguous grammar for patterns is:
1 2 3 |
S -: E$ E -: C | EE | E'|'E | E'?' | '(' E ')' C -: '0' | '1' | ... | '9'| 'a' | 'b' | ... | 'z' | '.' |
To reflect that option(‘?’) has highest precedence, then concatenation, then alternation(‘|’), the following unambiguous grammar can be created:
1 2 3 4 5 6 |
S -: E$ E -: T '|' E | T T -: F T | F F -: A '?' | A A -: C | '(' E ')' C -: Alphanumeric characters plus '.' |
As with the microproject, you will build an AST. This means that instead of building a node for E in your tree, which represents alternation (or perhaps the absence of alternation), you will build an Alternation node only if necessary. I recommend using the following types to get started:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
(* Scanner Types *) type token = Tok_Char of char | Tok_OR | Tok_Q | Tok_LPAREN | Tok_RPAREN | Tok_END (* AST Types *) type re = C of char | Concat of re * re | Optional of re | Alternation of re * re |
You must implement a recursive descent parser yourself to build the AST from the input string. Remember not to use any regular expression processing other than for the individual characters (either built in or in external libraries)!
Hints
This project is broken up roughly into three parts – the scanner, the parser, and the matcher. The parser will require you to write the most code, but the type checker should help immensely. The most difficult part of the project is probably the matcher, which requires more thinking!
Some sample parser output to help you (but remember, precedence is handled by the grammar, so don’t think too hard about that part!):
1 2 3 4 5 6 7 8 9 10 |
parse "ab";; - : re = Concat (C 'a', C 'b') parse "a|b";; - : re = Alternation (C 'a', C 'b') parse "ab?";; - : re = Concat (C 'a', Optional (C 'b')) parse "(ab)?";; - : re = Optional (Concat (C 'a', C 'b')) parse "ab|cd?";; - : re = Alternation (Concat (C 'a', C 'b'), Concat (C 'c', Optional (C 'd'))) |
Suggested Development Environment
VS Code + OCaml Platform Extension