Posted By:
Lance_Walton
Posted On:
Monday, March 1, 2004 01:29 AM
Hi. I'm having trouble defining a grammar. What I want is to be able to parse Java type names that may also have wildcards ('*' and '..' both mean any characters but in different ways - the code around the parser will take care of the interpretation). So for example, I want to be able to parse: MyClass com.foo.MyClass com..MyClass com.foo.My* com..*Class ..MyClass etc My lexer and parser are: class MyParser extends Parser; options { k = 2; exportVocab=myVocab; defaultErrorHandler = false;
More>>
Hi.
I'm having trouble defining a grammar. What I want is to be able to parse Java type names that may also have wildcards ('*' and '..' both mean any characters but in different ways - the code around the parser will take care of the interpretation).
So for example, I want to be able to parse:
-
MyClass
-
com.foo.MyClass
-
com..MyClass
-
com.foo.My*
-
com..*Class
-
..MyClass
-
etc
My lexer and parser are:
class MyParser extends Parser;
options {
k = 2;
exportVocab=myVocab;
defaultErrorHandler = false;
buildAST = true;
}
typeExpression
:
TypePattern
EOF!
;
class MyLexerLexer extends Lexer;
options {
exportVocab=spooky;
k=4;
charVocabulary='u0003'..'uFFFF';
defaultErrorHandler=false;
}
WS:
( ' '
| ' '
| 'f'
// handle newlines
| ( options {generateAmbigWarnings=false;}
: "
" // Evil DOS
| '
' // Macintosh
| '
' // Unix (the right way)
)
{ newline(); }
)+
{ _ttype = Token.SKIP; }
;
TypePattern:
(TypeQualifierSegmentPattern)*
TYPE_NAME_INITIAL_CHARACTERS (TYPE_NAME_SUBSEQUENT_CHARACTERS)*
;
protected TypeQualifierSegmentPattern:
(
'.'
| TYPE_NAME_INITIAL_CHARACTERS (TYPE_NAME_SUBSEQUENT_CHARACTERS)*
) '.'
;
protected TYPE_NAME_INITIAL_CHARACTERS:
'a'..'z' | 'A'..'Z' | '_' | '$' | '*'
;
protected TYPE_NAME_SUBSEQUENT_CHARACTERS:
TYPE_NAME_INITIAL_CHARACTERS | '0'..'9'
;
This gives me a nondeterminism warning:
[antlr] ANTLR Parser Generator Version 2.7.2 1989-2003 jGuru.com
[antlr] /Users/lance/Documents/workspace/Test/src/MyParser.g:39: warning:lexical nondeterminism upon
[antlr] /Users/lance/Documents/workspace/Test/src/MyParser.g:39: k==1:'$','*','A'..'Z','_','a'..'z'
[antlr] /Users/lance/Documents/workspace/Test/src/MyParser.g:39: k==2:'$','*','0'..'9','A'..'Z','_','a'..'z'
[antlr] /Users/lance/Documents/workspace/Test/src/MyParser.g:39: k==3:'$','*','0'..'9','A'..'Z','_','a'..'z'
[antlr] /Users/lance/Documents/workspace/Test/src/MyParser.g:39: k==4:
,'$','*','0'..'9','A'..'Z','_','a'..'z'
[antlr] /Users/lance/Documents/workspace/Test/src/MyParser.g:39: between alt 1 and exit branch of block
Can anyone help me to disambiguate this - simply suppressing the warnings doesn't help - when I try to parse 'Bob' I get "Expecting '.' found 'EOF'" errors.
Regards
Lance
<<Less