As of May 2017, the Java port of the SLEIGH compiler has been refactored some. This is in
preparation for the addition of the new "with" block. Perhaps some background as to why:

+----------------+
| The problem(s) |
+----------------+

Before refactoring, there was quite a bit of hacking to house the semantics and display parsers in
separate files from the overall parser. There were a number of reasons for doing this that I can
recall:
	1) It's good software engineering practice to separate large components into smaller related
	   components (Encapsulation).
	2) There exist use cases (e.g., PcodeCompiler) where the semantics parser is invoked apart from
	   the rest of the compiler.
	3) Lexical analysis is sensitive to the elements being parsed, and ANTLRv3 does not support
	   modal lexing natively.

The solution before refactoring applied a technique which simply doesn't settle well in the world
of computational theory, languages, and automata. The original programmer wrote a lexing rule in
the main SLEIGH grammar that read something like this:

SEMANTICS: '{' { /* usurp the input with a new lexer and parser for semantics, parse the input
                    until "EOF", and then restore the input to the main lexer/parser */ } ;

and then, in the sub-grammar:

RBRACE:	'}' { /* actually, emit EOF */ } ;

While this is clever and all, and theoretical complaints usually take a back seat to things that
work in practice, this introduced a few problems:
	1) The injected code for the SEMANTICS rule is actually pretty complicated.
	2) A lot of extra bookkeeping was necessary to print proper locations for errors.
	3) The '{' and '}', though related, were not parsed by the same grammar.
	4) [And this is the biggie:] SEMANTICS should really be a parser rule, not a lexer rule.

Because a lexer rule starting with '{' has been hacked to do context-free parsing things, adding a
with block that also uses braces is impossible. Anywhere '{' appears, the lexer immediately usurps
the input and tries parsing SEMANTICS according to a whole new parser/lexer. Sure, I could have
just used a different symbol, but that's just hacking around a hack. This needed to be fixed, and I
decided to pay the debt.

+---------+
| The fix |
+---------+

ANTLRv3 has a built-in mechanism for separating grammars into logical units, via "import". Granted,
there are restrictions on how it can be used, and there are some bookkeeping errors on the ANTLR
developers' part, but those can be worked around in ways that still preserve the soundness of the
grammars themselves. This will satisfy reasons 1 and 2 at the top. Reason 3 is a little more
difficult. Unfortunately, ANTLR doesn't support modal lexing until ANTLRv4; nonetheless, it's
relatively straightforward to add mode swapping mechanisms in ANTLRv3. Furthermore, native modal
lexing only permits the lexer to switch its own modes. For our use case, we'd need the parser to
control the mode of lexing. Again, with some custom classes, this is not too difficult, except that
we must be careful about the parser looking ahead. Here's what I've done, not necessarily in
chronological order:

First, I've factored out many of the @member actions, instead opting to implement them in an
abstract class. ANTLR provides an option to specify a class other than Lexer or Parser to extend
when generating Java source. There are two benefits: 1) I can keep most Java source in actual Java
source files instead of the grammar source, 2) The code is no longer duplicated among many parsers
can lexers.

Second, I've split each sub-grammar into separate lexer and parser. This is required because ANTLR
does not allow a "combined lexer/parser" to be imported by another lexer, parser, or combination.
Aside from avoiding situations that are difficult to define, ANTLR's restriction also enforces the
practice of separating components out. Note that BooleanExpression.g is unaffected by all of this,
since it's basically a stand-alone grammar.

There are now 3 lexers each with its own source:
	BaseLexer.g: The "normal" lexer for .slaspec files
	DisplayLexer.g imports SleighLexer.g: The lexer for the display portion, e.g.:
		':ADD op1,op2 is' 
	SemanticLexer.g imports SleighLexer.g: The lexer for the semantics portion, e.g.:
		'{ export *[register]:4 reg; }'

See their source for additional documentation. Each of these lexers stands in for a "mode" of the
actual root lexer, SleighLexer. Despite it's name, it is not generated by ANTLR. It is a POJO that
tracks a stack of modes. When the next token is requested, the mode at the top of the stack
determines which child lexer will be invoked. So long as the mode is changed in places that aren't
sensitive to lookahead, the parser can effectively inform the lexer mode. It is also a smart idea
to use an unbuffered token stream, because the buffered one may be tempted to look ahead where not
necessary. Thus, I've switched uses of CommonTokenStream to UnbufferedTokenStream. Proper channel
filtering is provided by LexerMultiplexer, which SleighLexer extends.

Similarly, there are now 3 parsers each with its own source:
	DisplayParser.g: The parser for the display portion
	SemanticParser.g: The parser for the semantics portion
	SleighParser imports DisplayParser, SemanticParser: The root parser for .slaspec files

See their source for additional documentation. The ANTLR-provided import mechanism is sufficient to
glue these parser grammars together. Each is given a pointer to the SleighLexer that feeds the root
parser so that parser rules can control the lexing mode. Rules with such changes MUST NOT change
the mode at a position past which the parser must look ahead when choosing an alternative, e.g.:

display
	:	':' { lexer.pushMode(DISPLAY); } pieces 'is' { lexer.popMode(); }
	|	':' 'is'
	;

pieces
	: pieces+
	;

This would causes a problem because both alternatives start with the prefix ':'. Thus, the parser
will look ahead one token. For the first alternative, it's potentially looking into pieces, which
lies beyond the mode switch. Note, however, that the injected code is not execute until the
alternative is actually chosen, so that look-ahead token may be lexed in the wrong mode. If it
sees 'is', it chooses the second alternative without a problem. But, if it sees something else,
it will choose the first alternative. Unfortunately, it does not know to drop its lookahead token,
nor would it be sound to, since the decision at this point could be flawed anyway. My
recommendation is to keep the mode transitions in rules that have a single alternative. This alone
is not sufficient since lookahead can be a complicated thing, but this usually helps. For the above
example, the solution is to allow pieces to produce epsilon, and remove the second alternative
altogether:

display
	:	':' { lexer.pushMode(DISPLAY); } pieces 'is' {lexer.popMode(); }
	;

pieces
	: pieces*
	;

This way, we can ensure the parser need only see the ':' to decide that it will parse according to
the display rule. It will properly change the lexing mode before asking the lexer for any token
after the colon. While it did require rethinking the grammar just slightly, the overall result is
actually pretty elegant compared to the previous. Additionally, I was able to remove all of the
extra location bookkeeping for the error reporting.

There's still a small issue with the build process, though. Because these lexers and grammars must
all cooperate during sleigh compilation, they must share a common token vocabulary. In other words,
the DisplayLexer must know about SemanticLexer's tokens even though it never refers to them itself,
simply so that they don't have overlapping numbers. To resolve this, a dummy lexer, called
SleighLexer.g is written to import the other lexers. Because of an ANTLR restriction, the dummy
lexer must specify at least one lexing rule. Building this lexer causes ANTLR to output a token
vocabulary containing all possible tokens from all the lexers. Those lexers then take this
vocabulary as their own. To avoid any confusion, the Gradle script deletes the Java source for the
dummy lexer, leaving just its .tokens file.

Additionally, there's a bit of a glitch with the way ANTLR handles "@header" actions with imported
grammars. A root grammar specifying its own "@header" action cannot import a grammar with another
"@header" action. Furthermore, the "@header" action is not actually inherited by the importing
grammar. I imagine this is an oversight on the ANTLR developers' parts. The "@header" is usually
needed to specify the package of the output source. ANTLRv4 mitigates this problem by providing a
'-package' commnd-line option, so that most "@header"s are not necessary, but ANTLRv3 doesn't have
this. Thus, I programmed the build script to insert the package header after calling ANTLR.
