Why does my lexer not recognize a token that is a substring of another token which does not have the same token
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Rohan_Alahakoon
Posted On:   Monday, November 1, 2004 06:46 AM

I want to differntiate the ""test"" and ""te"" using grammar file from the input char stream.

Re: Why does my lexer not recognize a token that is a substring of another token which does not have the same token

Posted By:   Anonymous  
Posted On:   Friday, December 3, 2004 01:55 AM

The following will nicely do this:

class MyLexer extends Lexer;

options {
charVocabulary = '3'..'377';
k=3;
}

SYMBOL
: "te"
| "test"
;

Note the k=3 - it is necessary so that the lexer can look ahead more than the length of "te" and check whether the next character is an 's' or (in this case only a) 't'.


This will correctly parse


tetetetetesttete


into


te/te/te/te/test/te/te


However, with more complex grammars it may not be so easy.


A fundamentally different approach is to use "keywords" with a keyword lookup table. See the ANTLR documentation for doing this (you would have something like


IDENTIFIER
: ('a'..'z')+
;

and then look up the string in a table.


Regards


Harald M.

About | Sitemap | Contact