Is ANTLR appropriate for building a line-oriented preprocessor like the C preprocessor or m4?
Created May 14, 2012
Terence Parr Greg Lindholm points out:
Concerning the preprocessor, a strategy that you may want to considered: do a line oriented loop to get input and then when you find a directive in column one, invoke an ANTLR rule to lex or parse just that line :) Works great. I also have built an HTML parser that used regular Java code to pick out the tags and then I used ANTLR to parse the arguments and such within the tags.
This mix-and-match strategy can be very useful in applications where handling the input is easy except for a few items such as preprocessor directives or html tags.
ANTLR doesn't have a way of specifing start-of-line as part of a rule. Once you have entered a rule you can use a sematic predicate to check what column your at, but this probably doesn't help you.Terence adds: This is similar to building an HTML lexer/parser. You want to match tags, but what do you do with all the text surrounding the tags? Well, I have built lexers that lump all that so-called CTEXT into a single big token. This may not be optimal, but works pretty well.
IMHO ANTLR is not well suited for writing a preprocessor as; 1) Preprocessors are line-oriented and ANTLR isn't. 2) The output of an ANTLR lexer is a token stream where the output of a preprocessor is a character stream (or file) that then gets fed into a lexer.
Concerning the preprocessor, a strategy that you may want to considered: do a line oriented loop to get input and then when you find a directive in column one, invoke an ANTLR rule to lex or parse just that line :) Works great. I also have built an HTML parser that used regular Java code to pick out the tags and then I used ANTLR to parse the arguments and such within the tags.
This mix-and-match strategy can be very useful in applications where handling the input is easy except for a few items such as preprocessor directives or html tags.