Posted By:
Timothy_Stone
Posted On:
Tuesday, September 3, 2002 07:18 AM
Karen,
I'm not a JavaCC hacker either, but here at least is a guess at the solution, if you haven't already solved it...
It appears that the syntax is a regular expression. So...
Token :
{
< CommentText3: (~["%"]){1} >
| < CommentEnd3: ">" > : DEFAULT
}
An alternate syntax might be:
Token :
{
< CommentText3: (~["%"])+ >
| < CommentEnd3: ">" > : DEFAULT
}
In the first example, "{1}" says match only one percent mark. The second example says match one or more percent marks (which is not what you need
Several questions are open still...
- Will this work? My initial guess is yes. The intended operation of the syntax is to remove *all* JSP statements, expressions and comments, i.e. any statement beginning with "<%" from Lucene operations on a repository.
- Any use of the new, and preferred, JSP XML syntax of course breaks this idea and further examination is needed.
- How does one integrate, i.e. compile, the changes into the HTMLParser.jj file? I have never hacked or coded JavaCC before either, so that answer remains to be found in a JavaCC forum or posted as a follow up.
- Anything other than Java or JSP syntax in scriptlets may throw the parser out of whack. For example, if you include HTML comments in a scriptlet the parser may choke ending the intended "commented" section to soon. But I'm not sure, because I haven't tested it.
Thoughtfully HTH,
Tim