Is there a C++ grammar for ANTLR? Is it even possible to parse C++ with a conventional grammar?

Terence Parr

Update: There is a C++ grammar now for ANTLR, converted from old PCCTS: here.

The simple answer is that C++ is pretty much impossible to parse by merely writing up its grammar and running it through ANTLR, YACC or whatever tool. One of the reasons is explained in ANTLR: A Predicated-LL(k) Parser Generator (PS): C++ is just plain ambiguous as hell unless you have infinite lookahead. Because ANTLR can backtrack selectively with syntactic predicates it can handle many of the strange constructs. Regardless, full C++ parsing is so hard that people typically build a parser that recognizes a large superset of C++ and then walk the resulting tree trying to prune it down using semantics and multiple subtree walks. Checking for a semantically valid C++ program is even harder. You have to maintain a complicated stack of symbol tables and so on. Many compilers use a handcrafted extremely complex parser.

That said, most people are not building C++ compilers and, consequently, need to learn much less from the input. For example, I helped NeXT build a C++ code browser for ProjectBuilder back in the glory days of NeXTStep (revived in Mac OS X thankfully..woohoo!). If you want only, for example, to grab the names of all classes and methods within the input, you can do that relatively easily if you have grammar/parsing experience. If doing stuff with C++ is your first jaunt into the parsing world, I'd ask for a different assignment. ;)

I should mention that John Lilley produced a fairly decent C++ parser for ANTLR's predecessor, PCCTS, but it required a modification to PCCTS and is not compatible with latest ANTLR.

Also, check out an article describing conversion of this old C++ grammar to new ANTLR