Lookbehind and other regex features
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Eric_Mahurin
Posted On:   Tuesday, July 13, 2004 10:45 AM

I'm a ANTLR beginner, but a long time perl expert of perl and its regular expressions - including the constructs that have embedded code. Is there any way to accomplish lookbehind in the lexer or parser? This is the main thing that I don't immediately see how to do in ANTLR that I could in Perl's regex's. Here is a complete list of things easily done in Perl's regex's that would be nice to have in ANTLR: (...){m,n} : Match at least m times and at most n. Also make sure that m and n can be at least integer variables of the target language. n=-1 could represent infinity so that {0,-1} means * and {1,-1} means + ({0,1} means ?). m=-1 could be used to force this alternative to fail- like a semantic pred   More>>

I'm a ANTLR beginner, but a long time perl expert of perl and its regular expressions - including the constructs that have embedded code.


Is there any way to accomplish lookbehind in the lexer or parser? This is the main thing that I don't immediately see how to do in ANTLR that I could in Perl's regex's.


Here is a complete list of things easily done in Perl's regex's that would be nice to have in ANTLR:


(...){m,n} : Match at least m times and at most n. Also make sure that m and n can be at least integer variables of the target language. n=-1 could represent infinity so that {0,-1} means * and {1,-1} means + ({0,1} means ?). m=-1 could be used to force this alternative to fail- like a semantic predicate. I realize that these features should be doable now with semantic predicates watching a counter within a * loop, but a more concise way would be nice - and not as tied to the target language.


(...){m,n}? (...)*? (...)+? ?? : Non-greedy matching. I know "options{greedy=false;}" works, but non-greedy seems commonly useful. Maybe even have another suffix to set greedy=true to override the default warning - how about "+" - {}+, *+, ++, ?+. I guess there are several ways you can deal with greediness: warn if it matters (ANTLR's default), greedy ignoring following patterns (ANTLR greedy=true, perl (>...)), greedy with backtracking (ANTLR doesn't do it, perl's default), non-greedy with backtracking (perl ? suffix to {}, *, +, and ?), and non-greedy w/o backtracking (ANTLR greedy=false, perl ? suffix to {}, *, +, and ? within (>...)). I realize backtracking is difficult and slow, but it would be nice to have it on occasion. Syntactic and semantic predicates do help replace backtracking a little. Another thing - I'm not sure why ANTLR doesn't allow non-greedy with ? (perl's ??), because it does make sense - only match the ? pattern if the lookahead doesn't match the pattern following ? pattern.


(? <=...) (? <!...): Positive and negative lookbehind assertions. I don't know of a way to do this in ANTLR.


(?=...) (?!...): Positive and negative lookahead assertions. I know semantic predicates going to the cover this (syntactic predicates may also cover the positive lookahead assertions, but I'm not sure). An easier to read way of doing it like this would be nice.

   <<Less
About | Sitemap | Contact