jGuru
Register Email     Password Forgot your
password?
HOME FAQS FORUMS DOWNLOADS ARTICLES PEERSCOPE LEARN

  Search   jGuru Search Help

Question What is a "protected" lexer rule?
Topics Tools:ANTLR:Grammars:Rules, Tools:ANTLR:Recognition:Lexical analysis
Author Terence Parr PREMIUM
Created Sep 3, 1999


Answer
A lexer is a TokenStream source that merely spits out a stream of Token objects to the parser (or another stream consumer). As such, a lexer implements method nextToken() to satisfy interface TokenStream. The parser repeatedly calls yourlexer.nextToken() to get tokens.

What token definitions result in token objects that get sent to the parser? The answer you'd expect or the one you're used to is, "You get a Token object for every lexical rule in your lexer grammar." This is indeed the default case for ANTLR's lexer grammars.

What if you want to break up the definition of a complicated rule into multiple rules? Surely you don't want every rule to result in a complete Token object in this case. Some rules are only around to help other rules construct tokens. To distinguish these "helper" rules from rules that result in tokens, use the protected modifier. This overloading of the access-visibility Java term occurs because if the rule is not visible, it cannot be "seen" by the parser.

Another, more practical, way to look at this is to note that only non-protected rules get called by nextToken() and, hence, only non-protected rules can generate tokens that get shoved down the TokenStream pipe to the parser.

I now recognize this approach as a mistake. I have a number of other proposals to fix this, none that seems to satisfy everyone.

class L extends Lexer;

/** This rule is "visible" to the parser
 *  and a Token object is sent to the
 *  parser when an INT is matched.
 */
INT : (DIGIT)+ ;


/** This rule does not result in a token
 *  object that is passed to the parser.
 *  It merely recognizes a portion of INT.
 */
protected
DIGIT : '0'..'9' ;

By definition, all lexical rules return Token objects (ANTLR optimizes away many of these object creations, however), but only the Token objects of non-protected rules get pulled out of the lexer itself.

Is this item helpful?  yes  no     Previous votes   Yes: 7  No: 0



Comments and alternative answers

Comment on this FAQ entry

mistake?
Adrian Sandor, Jun 27, 2003  [replies:2]
"I now recognize this approach as a mistake"
why?

Is this item helpful?  yes  no     Previous votes   Yes: 1  No: 0



Reply to this answer/comment  Help  
Re: mistake?
Terence Parr PREMIUM, Aug 4, 2003  [replies:1]
Because overloading the term "protected" has confused the hell out of people ;)

Is this item helpful?  yes  no     Previous votes   Yes: 1  No: 0



Reply to this answer/comment  Help  
Re[2]: mistake?
Sean Farrell, Aug 18, 2006
If you come from a object oriented approach the term protected is right to the point. The rules are only "seen" by the lexer. The only drawback is that there is no private rules. I am not to acquainted yet to antlr to give any meaningful comment on inheritance of grammars, but a application of the terms private, protected and public, following the same notation as visibility in object oriented languages, may be a interesting point to investigate.

Is this item helpful?  yes  no     Previous votes   Yes: 0  No: 0



Reply to this answer/comment  Help  
Is this an encapsulation notation problem?
Tim Davis, Feb 6, 2004

The fact that I'm not confused by the "protected" keyword issues suggests that I'm either confused or ignorant. Since I don't know Java I'll presume I'm ignorant.

The explanation of the "protected" keyword seems plain to me. If the word is the issue then I suggest "invisible".

It seems more like an issue of encapsulation to me. You want the lexer's rule names to be either visibile or invisible outside of the capsule (enclosure). You have no explicit encapsulation syntax so you have used the implicit notion of captilization to identify visible/invisible. I would propose an explicit interface notation to solve the problem. The notation would express the rule names on the surface of the capsule. When the capsule is instantiated the rule names become visible in the higher level enclosure. The interface could distinquish between input stream of characters or tokens and thus differentiate "lexers" from "parsers".

I presume there is a need for (parser( parser (...))).

Also, making the interface notation separate from the capsule notation would also be beneficial (Ala VHDL's entity and architecture.)

--
Tim Davis


Is this item helpful?  yes  no     Previous votes   Yes: 1  No: 0



Reply to this answer/comment  Help  



Ask A Question



 
Related Links

ANTLR FAQ

ANTLR Forum

Download now!

Resources

Documentation

Book in progress

ANTLR Consulting

Wish List
Features
About jGuru
Contact Us

 


Internet.com
The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers