I want tree grammar C to morph its input tree to fit tree grammar Pascal, which has a different vocabulary. How do I handle the token type vocabulary issues?

Terence Parr

First, a requirement. You want the token names to have the same token types in both grammars so that AST node SEMI matches SEMI in both grammars. Having different token type values would essentially mean SEMI in C grammar and SEMI in Pascal grammars were different.

This vocabulary issue is actually the normal case...language A with vocab A translates to language B with vocab B, and so on until the final transofrmation phase. The easiest way to handle this (until ANTLR provides vocabulary unioning operations) is to have the first grammar in the transformation define all the tokens, which are then defined for any furter phase that imports the first grammar's export vocabulary.

The normal situation looks like this:

class A extends Lexer;
class B extends Parser;

class C extends TreeParser; // pass one over tree

class D extends TreeParser; // pass two over tree

You want A,B to share a vocab and have C import B's vocab so it knows what the proper AST node token types are. Then, you want to augment the vocab so that it has nodes with extra token types as referenced in D. How to augment C? Have D import C, which then generates DTokenTypes.txt or whatever. BUT, this cannot be fed back to C as it's "chicken or the egg".

Since ANTLR cannot yet to vocab unions, the real solution is to simply define token types for enhanced C vocab in C grammar:

class C extends TreeParser; // pass one over tree

tokens { // define token types for output vocab of C

Then, D will import C as usual, but D won't add any tokens as C is union of old plus stuff D needs.

Another way to handle this is to define some or all of the tokens in a supergrammar, which all phases inherit from or importVocab from.