Lexer code crashes for lookahead of 7
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Don_McClean
Posted On:   Tuesday, January 29, 2002 01:44 PM

I am using ANTLR for parsing a large verbose data structure language that was defined as an external standard. It is line oriented. Many of the 400 unique keywords are very similar, for example the following words all start with the word START_ (hence I have left factored out START_). protected START_ : "START_"; START_BARCODE : (START_ "BARCODE")! DATA_VALUE_NL ; START_DATA_MANIPULATION : (START_ "DATA_MANIPULATION")! DATA_VALUE_NL ; . . START_TITLE : (START_ "TITLE")! DATA_VALUE_NL ; START_VENDOR_INFO : (START_ "VENDOR_INFO")! DATA_VALUE_NL ; I am having a problem even wi   More>>


I am using ANTLR for parsing a large
verbose data structure language that was defined as an external standard. It is line oriented. Many of the 400 unique keywords are very similar, for example the following words all start with the word START_ (hence I have
left factored out START_).



			
protected
START_ : "START_";
START_BARCODE : (START_ "BARCODE")! DATA_VALUE_NL ;
START_DATA_MANIPULATION : (START_ "DATA_MANIPULATION")! DATA_VALUE_NL ;
.
.
START_TITLE : (START_ "TITLE")! DATA_VALUE_NL ;
START_VENDOR_INFO : (START_ "VENDOR_INFO")! DATA_VALUE_NL ;


I am having a problem even with the
factoring with a lookahead of 5, as
the code generated in 'nextToken' does
not distinguish between the two,
even though I left factored:

			

else if ((LA(1)=='S') && (LA(2)=='T') && (LA(3)=='A') && (LA(4)=='R') && (LA(5)=='T')) {
mSTART_DEFECT_DEFINITION(true);
theRetToken=_returnToken;
}
else if ((LA(1)=='S') && (LA(2)=='T') && (LA(3)=='A') && (LA(4)=='R') && (LA(5)=='T')) {
mSTART_DEFECT_MEASUREMENTS(true);
theRetToken=_returnToken;
}

If I set the lookahead to 7, the
java compiler or jre apparently cannot handle the code generated, as I get a runtime error message:

			
java.lang.VerifyError: (class: com/ti/rts/parser/SemiFileLexer, method: nextToken signature: ()Lantlr/Token;) Illegal instruction found at offset 409

at com.ti.rts.parser.RunParser.main(RunParser.java:23)


Has any one seen anything like this?
Or has any suggestions?

Should I perhaps create my own lexer? Since this is a line oriented
language and is simple to recognize the
keyword at the beginning of each line,
I could pass the token to the parser.


I would appreciate any suggestions.

   <<Less

Re: Lexer code crashes for lookahead of 7

Posted By:   Terence_Parr  
Posted On:   Thursday, January 31, 2002 06:29 PM

I would suggest making a KEYWORD lexer rule that matches any word and then let ANTLR test the word against the literals (i.e., those strings referenced in the parser). This is how a typical grammar will decide between IF and FOR.
About | Sitemap | Contact