Hi There!

I'm Dan Schlegel, an Associate Professor in the Computer Science Department at SUNY Oswego

Approaches to Designing Case Classes for a Grammar

There are two (related) main approaches to completing this task. We will explore the issue using a small grammar for concatenations of a’s and b’s, below.

Remember that case classes will be used to build a parse tree using this grammar. Instances of the case classes will act as nodes, with their parameters acting as the edges between nodes. The observation you should immediately make is that the non-terminals in the above grammar also act as nodes when a parse tree is built. Therefore, it seems natural to use the non-terminals for the case classes. 

The second observation you should make as you are creating your case class hierarchy is that E seems to take either one or two parameters – two subtrees (C, E) or only one subtree (C). The remainder of this document focuses on two approaches to solving this. In both of the following examples I use parser combinators, though you could write a recursive descent parser using the same ideas. 

Using the Optional Type

Scala allows the use of the Optional type in situations where a value may only optionally be present. Remember that the Optional type allows for improved static analysis over the use of null. The case class definition here is straight forward – E is defined as always having a “left” branch (l), which is  Tree, and an Option[Tree] right branch (r). 

Notice that the combinator builds E as either E(l, Some(r)) in the case where r has a value (the recursive case), or as E(l, None) where it does not. Running this code results in the following output parse tree: 

Note that in pattern matching on these case classes, you’ll need to have cases for Some(…) and None. 

Skipping Unused Levels

A second approach is perhaps a bit less natural, and will require you to prove to yourself that it works in all cases. This is the approach I used in the Sum example in class. In this grammar, the core observation is that

is no less useful than

in maintaining all of the features we are interested in in our grammar. While this grammar does not exhibit it, the primary issue we are concerned with is preservation of operator precedence. If you can convince yourself of this, you might also convince yourself that this is true for more nested cases: 

is just as good as

in preserving the correctness of the parse for the properties we care about. This allows us to eschew the use of the Optional type, and to simplify our parser somewhat.  

The primary change to the parser is that the e function in our combinator no longer transforms the output from the call to the c rule (previously, the result was encased in an E instance. Instead, it just uses the result from that rule. This produces the following output:

Both of these approaches are reasonable for use in the main project – pattern matching over either one will be of appropriate difficulty. 

Other Ideas

There are other approaches which you might take to designing your case classes. The simplest is to give your case classes better names to make the output more readable. Perhaps E could be called Concat in this case, and C could be called Char. This is best paired with the case of skipping unused levels. 

A second idea would be to modify the grammar to always force the appropriate number of arguments: 

This results in code somewhat like the following: 

This is not as elegant as either of the previous solutions, but does produce reasonable output: