Is ANTLR appropriate for building a line-oriented preprocessor like the C preprocessor or m4?

Terence Parr

Greg Lindholm points out:

ANTLR doesn't have a way of specifing start-of-line as part of a rule. Once you have entered a rule you can use a sematic predicate to check what column your at, but this probably doesn't help you.

IMHO ANTLR is not well suited for writing a preprocessor as; 1) Preprocessors are line-oriented and ANTLR isn't. 2) The output of an ANTLR lexer is a token stream where the output of a preprocessor is a character stream (or file) that then gets fed into a lexer.
Terence adds: This is similar to building an HTML lexer/parser. You want to match tags, but what do you do with all the text surrounding the tags? Well, I have built lexers that lump all that so-called CTEXT into a single big token. This may not be optimal, but works pretty well.

Concerning the preprocessor, a strategy that you may want to considered: do a line oriented loop to get input and then when you find a directive in column one, invoke an ANTLR rule to lex or parse just that line :) Works great. I also have built an HTML parser that used regular Java code to pick out the tags and then I used ANTLR to parse the arguments and such within the tags.

This mix-and-match strategy can be very useful in applications where handling the input is easy except for a few items such as preprocessor directives or html tags.