Returning from a sub-parser with no end token.
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Remi_Koutcherawy
Posted On:   Wednesday, December 26, 2001 02:51 AM

I use sub-lexers and sub-parser to decode multiple formats in a single file. This a great ANTLR feature as I can parse radically different portions of the input, without writing a complicated multi-format parser. Nevertheless, in the ANTLR examples, there are always end-token matched by the sub-parser. Therefore the sub-lexer doesn't have to look ahead. The chars following are of no concern to the sub-lexer. There is normally a single token that signifies the termination of the subparser. However, what if the end-token is optional ? The sub-lexer looks ahead, fails and consumes until resync ! Here is a short example. How can you indicate    More>>

I use sub-lexers and sub-parser to decode multiple formats in a single file.

This a great ANTLR feature as I can parse radically different portions of the input, without writing a complicated multi-format parser.



Nevertheless, in the ANTLR examples, there are always end-token matched by the sub-parser.

Therefore the sub-lexer doesn't have to look ahead. The chars following are of no concern to the sub-lexer.
There is normally a single token that signifies the termination of the subparser.



However, what if the end-token is optional ?

The sub-lexer looks ahead, fails and consumes until resync !



Here is a short example.

How can you indicate the sub-parser "B_Parser" to return on an invalid char ?



Main parser and lexer :

			
// Main parser ------
class Main_Parser extends Parser;
{
static antlr.TokenStreamSelector selector = new antlr.TokenStreamSelector();

public static void main(String[] args) {
try {
// This string is simulating the real file
java.io.StringReader input = new java.io.StringReader(
"AAA aa BBB b bb BBB AAA a a BBB b b BBB
" // with end BBB
+ "AAA aa BBB b bb AAA a BBB b b"); // without end BBB

Main_Lexer main = new Main_Lexer(input);
B_Lexer b_lexer = new B_Lexer(main.getInputState());
selector.addInputStream(main, "main");
selector.addInputStream(b_lexer, "BBB");
selector.select("main");

Main_Parser parser = new Main_Parser(selector);
parser.parse();
}
catch(Exception e) {
System.err.println("exception: "+e);
e.printStackTrace(System.err);
}
try {System.in.read();} catch (Exception ex) {}
}
}

// Only one rule in the Parser for simplicity ------
parse:
( BBB // Matching BBB switch to B_Parser with B_Lexer
{
selector.push("BBB");
B_Parser b_parser = new B_Parser(getInputState());
b_parser.parse();
selector.pop();
setInputState(b_parser.getInputState());
System.out.println("BBB parsing complete");
}
| ( AAA (A)+ ) // Matching AAA is done within this parser for simplicity
{System.out.println("AAA parsing complete");}
)+
;

// Main lexer ------
class Main_Lexer extends Lexer;
options {
filter = WS;
k =2;
}
AAA: "AAA"
;
A: 'a'
;
BBB: "BBB"
;
protected
WS: (' ' | ' ' | ('
' | '
' |"
") {newline();} )
{ _ttype = Token.SKIP; }
;


Subparser and sublexer :

			
// B parser ------
class B_Parser extends Parser;

parse: (B)+ (BBB)? // This is ok if there is a terminating token BBB
// If there is no token to signify the termination
// of the subparser.
// This fails as the lexer consumes all invalid chars.
;

// B lexer ------
// How can I tell this lexer to send EOF if the char is not in his vocabulary ?
// I wish I can push back the char witch rises a NoViableAltForCharException
// and return Token.EOF_TYPE.

class B_Lexer extends Lexer;
options {
filter = WS;
k =2;
}
BBB: "BBB"
;
B: 'b'
;
protected
WS: (' ' | ' ' | ('
' | '
' |"
") {newline();} )
{ _ttype = Token.SKIP; }
;
   <<Less
About | Sitemap | Contact