Posted By:
Adam_McClure
Posted On:
Friday, August 30, 2002 01:26 PM
I'm going to answer my own question here in hopes others will find it illustrative.
The main issue was that there was no disambiguating character. The '"' and '\'' characters were both given two different syntactic meanings. Therefore there was no single token that could be recognized to switch lexers as part of a multiplexing scheme.
So I had to take a different route....
The solution was to refactor the code to recognize quotes as their own lexer rule with a subrule to recognize escaped quotes ("""" and "\'\'") and return the appropriate token.
DQUOTE : '"' ('"' {$setType(ESC_DQUOTE);})? ;
QUOTE : '\' ('\'' {$setType(ESC_QUOTE);})? ;
Then I went up to the parser layer and defined a rule called 'string' like this:
string :
DQUOTE (ESC_DQUOTE | {LA(1)!=DQUOTE}? .)* DQUOTE
| QUOTE (ESC_QUOTE | {LA(1)!=QUOTE}? .)* QUOTE
;
Works like a charm. When I want to parse the string completely, I simply recognize DQUOTE or QUOTE and let the parser continue with subrules and follow up with a matching quote token (e.g. DQUOTE mySubrule DQUOTE). When I want a string literal I use the 'string' parser rule.
Voila!