Listener based parsing mode with Antlr.
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Anonymous
Posted On:   Monday, May 6, 2002 05:32 AM

Hello, I'm trying to fire SAX parser events with Antlr. Sorry if this question is a little lengthy... I'm the developer of Ejen , a Java/XSLT based code generation system. This system should be able to handle almost any kind of input text files (ie: not only native XML files), provided a context-free grammar that describes those inputs. The problem is: to be used in the generation process (and especially by XSL stylesheets), those text inputs must be "flatten" into XML representations. I'm currently using Javacc in order to convert a Java source file into an XML representation. This is acheived by the mean of modified SimpleNode.java and Token.java classes. Basically, a   More>>

Hello,


I'm trying to fire SAX parser events with Antlr. Sorry if this question is a little lengthy...


I'm the developer of Ejen , a Java/XSLT based code generation system. This system should be able to handle almost any kind of input text files (ie: not only native XML files), provided a context-free grammar that describes those inputs. The problem is: to be used in the generation process (and especially by XSL stylesheets), those text inputs must be "flatten" into XML representations.


I'm currently using Javacc in order to convert a Java source file into an XML representation. This is acheived by the mean of modified SimpleNode.java and Token.java classes. Basically, a toNode() method is added to those classes that returns an org.w3c.dom.Node with SimpleNode or Token contents and I have finally a Xalan extension with this kind of code:

			
new JavaParser(new BufferedReader(new FileReader("X.java")));
SimpleNode sn = JavaParser.CompilationUnit();
sn.toNode(doc, parentNode, ...);
return parentNode.getFirstChild();

The sn.toNode(...) call (that recursivly calls toNode(...) methods of SimpleNode and Token classes, with special tokens support) build an entire DOM tree that represents the X.java file. It works fine and I can use this in order to generate, for example, dependent interfaces from a core java class: I made a full EJB 1.1 BMP generation example that deduces remote and home interfaces from a previously generated BMP entity bean class. I can also use this in order to generate HTML views of java source files with syntax highlighting.


There are at least two problems with this approach:


  • I must modify by hands, for each grammar, the SimpleNode and Token classes (don't have any access to the JJTree preprocessor source code).
  • The input text file must be entirely parsed before I can get this DOM tree.

What I would like to get is a "Listener-based" parsing process that fires rule/token (even hidden) events, using this kind of generic interface:
			
public interface ParseListener {
public void enterRule(int id);
public void exitRule(int id);
public void newToken(int id, String value, boolean hidden);
}

This should be used with the buildAST option set to false: the listener is responsible of rule/token tree creation and there is nothing left after parsing in the parser instance itself (parser.getAST() returns null, etc.). In this Listener mode, we should have access to rule names (something like ruleNames[id] should be available).


From my XML (Ejen) point of view, I could use this in order to fire corresponding SAX events and, finally, get something like this:

			


package


org
.
whatever

;

...


I think this kind of Listener based parsing mode could be very interesting in order to build plain-text to XML (or other format) translations, without memory related problems: if you have 10 gigs of plain-text data and if you have the corresponding Antlr grammar, you can produce on the fly a desired output file in this mode.


There is, finally and at least, a problem with the implementation of this mode: we don't want "guessing" parse events and this is a problem I couldn't solve when I tried to modify Antlr source code...


Any idea ?


Thanks a lot for Antlr,

Regards,

F. Wolff.

   <<Less

Re: Listener based parsing mode with Antlr.

Posted By:   Monty_Zukowski  
Posted On:   Tuesday, May 7, 2002 06:37 AM

How did you modify the Antlr source code?


Guessing is a state held in inputState. If guessing>0 then don't fire your events. Guessing mode means to try the parse but execute no actions so that you can backtrack if the guess was false.


What you want to do may be easily accomplished with the -traceParser option, with you overriding the traceIn(), traceOut() and match() methods. I'm pretty sure that would be all you need but I don't know all your detailed requirements.

About | Sitemap | Contact