Approaches to Designing Case Classes for a Grammar

There are two (related) main approaches to completing this task. We will explore the issue using a small grammar for concatenations of a’s and b’s, below.

E := C E | C
C := 'a' | 'b'

Remember that case classes will be used to build a parse tree using this grammar. Instances of the case classes will act as nodes, with their parameters acting as the edges between nodes. The observation you should immediately make is that the non-terminals in the above grammar also act as nodes when a parse tree is built. Therefore, it seems natural to use the non-terminals for the case classes.

The second observation you should make as you are creating your case class hierarchy is that E seems to take either one or two parameters – two subtrees (C, E) or only one subtree (C). The remainder of this document focuses on two approaches to solving this. In both of the following examples I use parser combinators, though you could write a recursive descent parser using the same ideas.

Using the Optional Type

Scala allows the use of the Optional type in situations where a value may only optionally be present. Remember that the Optional type allows for improved static analysis over the use of null. The case class definition here is straight forward – E is defined as always having a “left” branch (l), which is Tree, and an Option[Tree] right branch (r).

abstract class Tree
case class E(l: Tree, r: Option[Tree]) extends Tree
case class C(n: Character) extends Tree

object MicroV1 extends CombinatorsV1{
  def main(args: Array[String]){
    val exp2c:Tree = parseAll(e, "abab").get
    println(exp2c)
  }
}

class CombinatorsV1 extends JavaTokenParsers{
  def e: Parser[Tree] = c ~ e ^^ { case l ~ r => E(l, Some(r)) } | c ^^ { case l => E(l, None) }
  def c: Parser[Tree] = "[A-Za-z]".r ^^ { str => C(str.charAt(0)) }
}

Notice that the combinator builds E as either E(l, Some(r)) in the case where r has a value (the recursive case), or as E(l, None) where it does not. Running this code results in the following output parse tree:

E(C(a),Some(E(C(b),Some(E(C(a),Some(E(C(b),None)))))))

Note that in pattern matching on these case classes, you’ll need to have cases for Some(…) and None.

Skipping Unused Levels

A second approach is perhaps a bit less natural, and will require you to prove to yourself that it works in all cases. This is the approach I used in the Sum example in class. In this grammar, the core observation is that

C(a)

is no less useful than

E(C(a), None)

in maintaining all of the features we are interested in in our grammar. While this grammar does not exhibit it, the primary issue we are concerned with is preservation of operator precedence. If you can convince yourself of this, you might also convince yourself that this is true for more nested cases:

E(C(a), E(C(b), C(a)))

is just as good as

E(C(a), E(C(b), E(C(a), None)))

in preserving the correctness of the parse for the properties we care about. This allows us to eschew the use of the Optional type, and to simplify our parser somewhat.

abstract class Tree
case class E(l: Tree, r: Tree) extends Tree
case class C(n: Character) extends Tree

object MicroV2 extends CombinatorsV2{
  def main(args: Array[String]){
    val exp2c:Tree = parseAll(e, "abab").get
    println(exp2c)
  }
}

class CombinatorsV2 extends JavaTokenParsers{
  def e: Parser[Tree] = c ~ e ^^ { case l ~ r => E(l, r) } | c
  def c: Parser[Tree] = "[A-Za-z]".r ^^ { str => C(str.charAt(0)) }
}

The primary change to the parser is that the e function in our combinator no longer transforms the output from the call to the c rule (previously, the result was encased in an E instance. Instead, it just uses the result from that rule. This produces the following output:

E(C(a),E(C(b),E(C(a),C(b))))

Both of these approaches are reasonable for use in the main project – pattern matching over either one will be of appropriate difficulty.

Other Ideas

There are other approaches which you might take to designing your case classes. The simplest is to give your case classes better names to make the output more readable. Perhaps E could be called Concat in this case, and C could be called Char. This is best paired with the case of skipping unused levels.

A second idea would be to modify the grammar to always force the appropriate number of arguments:

E := C E | C NIL
C := 'a' | 'b'
NIL := A dummy case class

This results in code somewhat like the following:

abstract class Tree
case class E(c: Tree, e: Tree) extends Tree
case class C(n: Character) extends Tree
case object NIL extends Tree

object MicroV3 extends CombinatorsV1{
  def main(args: Array[String]){
    val exp2c:Tree = parseAll(e, "abab").get
    println(exp2c)
  }
}

class CombinatorsV1 extends JavaTokenParsers{
  def e: Parser[Tree] = c ~ e ^^ { case l ~ r => E(l, r) } | c ^^ { case l => E(l, NIL) }
  def c: Parser[Tree] = "[A-Za-z]".r ^^ { str => C(str.charAt(0)) }
}

This is not as elegant as either of the previous solutions, but does produce reasonable output:

E(C(a),E(C(b),E(C(a),E(C(b),NIL))))

Hi There!

Approaches to Designing Case Classes for a Grammar

Using the Optional Type

Skipping Unused Levels

Other Ideas