Posted By:
Anonymous
Posted On:
Monday, May 6, 2002 05:32 AM
Hello, I'm trying to fire SAX parser events with Antlr. Sorry if this question is a little lengthy... I'm the developer of Ejen , a Java/XSLT based code generation system. This system should be able to handle almost any kind of input text files (ie: not only native XML files), provided a context-free grammar that describes those inputs. The problem is: to be used in the generation process (and especially by XSL stylesheets), those text inputs must be "flatten" into XML representations. I'm currently using Javacc in order to convert a Java source file into an XML representation. This is acheived by the mean of modified SimpleNode.java and Token.java classes. Basically, a
More>>
Hello,
I'm trying to fire SAX parser events with Antlr. Sorry if this question is a little lengthy...
I'm the developer of
Ejen
, a Java/XSLT based code generation system. This system should be able to handle almost any kind of input text files (ie: not only native XML files), provided a context-free grammar that describes those inputs. The problem is: to be used in the generation process (and especially by XSL stylesheets), those text inputs must be "flatten" into XML representations.
I'm currently using Javacc in order to convert a Java source file into an XML representation. This is acheived by the mean of modified
SimpleNode.java
and
Token.java
classes. Basically, a
toNode()
method is added to those classes that returns an org.w3c.dom.Node with SimpleNode or Token contents and I have finally a Xalan extension with this kind of code:
new JavaParser(new BufferedReader(new FileReader("X.java")));
SimpleNode sn = JavaParser.CompilationUnit();
sn.toNode(doc, parentNode, ...);
return parentNode.getFirstChild();
The
sn.toNode(...)
call (that recursivly calls toNode(...) methods of SimpleNode and Token classes, with special tokens support) build an entire DOM tree that represents the X.java file. It works fine and I can use this in order to generate, for example, dependent interfaces from a core java class: I made a full EJB 1.1 BMP generation example that deduces remote and home interfaces from a previously generated BMP entity bean class. I can also use this in order to generate HTML views of java source files with syntax highlighting.
There are at least two problems with this approach:
-
I must modify by hands, for each grammar, the SimpleNode and Token classes (don't have any access to the JJTree preprocessor source code).
-
The input text file must be entirely parsed before I can get this DOM tree.
What I would like to get is a "Listener-based" parsing process that fires rule/token (even hidden) events, using this kind of generic interface:
public interface ParseListener {
public void enterRule(int id);
public void exitRule(int id);
public void newToken(int id, String value, boolean hidden);
}
This should be used with the
buildAST
option set to false: the listener is responsible of rule/token tree creation and there is nothing left after parsing in the parser instance itself (parser.getAST() returns null, etc.). In this Listener mode, we should have access to rule names (something like
ruleNames[id]
should be available).
From my XML (Ejen) point of view, I could use this in order to fire corresponding SAX events and, finally, get something like this:
package
org
.
whatever
;
...
I think this kind of Listener based parsing mode could be very interesting in order to build plain-text to XML (or other format) translations, without memory related problems: if you have 10 gigs of plain-text data and if you have the corresponding Antlr grammar, you can produce on the fly a desired output file in this mode.
There is, finally and at least, a problem with the implementation of this mode: we don't want "guessing" parse events and this is a problem I couldn't solve when I tried to modify Antlr source code...
Any idea ?
Thanks a lot for Antlr,
Regards,
F. Wolff.
<<Less