How can I filter an input file for only those constructs I care about, ignoring everything else? Or, how can I get ANTLR to operate like SED or AWK?

Terence Parr

SED and AWK are great tools bestowed upon us from the great Uncle UNIX. They have one serious limitation, however: the tools are line-oriented and cannot handle simple translation problems for structured files like HTML. Consider performing an operation on the file names in <IMG> tags. The minute a tag spans more than one line, AWK and SED break down.

ANTLR 2.5.0 introduced an AWK-like lexical filtering mode that forces generated lexers to ignore any characters that do not match a lexical rule exactly. To turn ANTLR into SED, all you have to do is make a lexical filter rule that prints out the characters that don't match anything. Then, it's up to the lexical rules to generate what they want.

Consider the following contrived example that turns
and

tags into their uppercase equivalents and dumps anything other than those tags to standard output:

class T extends Lexer;
options {
  k=2;
  filter=IGNORE;
  charVocabulary = '3'..'177';
} 
P : "<p>"  {System.out.print("<P>");};
BR: "<br>" {System.out.print("<BR>");}; 
protected IGNORE
  : ( "
" | '
' | '
' )
    {newline(); System.out.println("");}
  | c:. {System.out.print(c);}
  ;
Rather than have a "filter=sed" option, it is simple enough to use this idiom: put a print statement in a filter rule.

You'll also want to look at the Filtering Input Streams documentation.
0 Comments  (click to add your comment)
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

About | Sitemap | Contact