A lexer is a TokenStream source that merely spits out a
stream of Token objects to the parser (or another stream
consumer). As such, a lexer implements method nextToken() to
satisfy interface TokenStream. The parser repeatedly calls
yourlexer.nextToken() to get tokens.
What token definitions result in token objects that get sent to
the parser? The answer you'd expect or the one you're used to is,
"You get a Token object for every lexical rule in your lexer
grammar." This is indeed the default case for ANTLR's lexer grammars.
What if you want to break up the definition of a complicated rule into
multiple rules? Surely you don't want every rule to result in a
complete Token object in this case. Some rules are only
around to help other rules construct tokens. To distinguish these
"helper" rules from rules that result in tokens, use the
protected modifier. This overloading of the
access-visibility Java term occurs because if the rule is not visible,
it cannot be "seen" by the parser.
Another, more practical, way to look at this is to note that only
non-protected rules get called by nextToken() and, hence,
only non-protected rules can generate tokens that get shoved down the
TokenStream pipe to the parser.
I now recognize this approach as a mistake. I have a number of other
proposals to fix this, none that seems to satisfy everyone.
class L extends Lexer;
/** This rule is "visible" to the parser
* and a Token object is sent to the
* parser when an INT is matched.
*/
INT : (DIGIT)+ ;
/** This rule does not result in a token
* object that is passed to the parser.
* It merely recognizes a portion of INT.
*/
protected
DIGIT : '0'..'9' ;
By definition, all lexical rules return Token objects (ANTLR
optimizes away many of these object creations, however), but only the
Token objects of non-protected rules get pulled out of the
lexer itself.