Can I build an ANTLR lexer that recognizes string ...
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Anonymous
Posted On:   Thursday, August 17, 2000 07:29 AM

Can I build an ANTLR lexer that recognizes string tokens containing Unicode, but that restricts all other tokens to be simple 7-bit ASCII encoding?

Can I build an ANTLR lexer that recognizes string tokens containing Unicode, but that restricts all other tokens to be simple 7-bit ASCII encoding?

Posted By:   Terence_Parr  
Posted On:   Thursday, July 19, 2001 04:55 PM

Yes, but you will have to be very specify; i.e., you couldn't use wildcards or ~ (not) operators.
If you specify a STRING rule that has 'u0080'..'ufffe' or some such in there, it will increase the vocabulary for the whole input stream. The wildcard would then include all that UNICODE stuff. So, you'd have to use 'a'..'z' and such in the ID rule.



A simpler way is to allow UNICODE everywhere and the catch the use of UNICODE outside of a string afterwards (or as part of the input char stream).
About | Sitemap | Contact