summary text for indexed jsp files -- modify the HTMLParser.jj
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   karen_bran
Posted On:   Monday, August 12, 2002 02:04 PM

Hello, I modified the IndexHTML.java and let the jsp files be indexed, but the source code of the jsp tags such as <%@page import....... shows up in the result summary. I checked this mailing list messages, someone suggested to modify the HTMLParser.jj file to make the jsp tag text as the 3rd comment. Since I am not familiar with the Javacc grammar, I don't know how to hack the HTMLParser.jj and insert in the 3rd comment tag for the jsp tag. here is the 2 existing comment tags in the HTMLParser.jj, can someone help me to figure out how to add the 3rd one ??? Thanks a lot. TOKEN : { < CommentText1:    More>>

Hello,

I modified the IndexHTML.java and let the jsp files be indexed, but the
source code of the jsp tags such as <%@page import....... shows up in
the result summary.

I checked this mailing list messages, someone suggested to modify the
HTMLParser.jj file to make the jsp tag text as the 3rd comment. Since I
am not familiar with the Javacc grammar, I don't know how to hack the
HTMLParser.jj and insert in the 3rd comment tag for the jsp tag.

here is the 2 existing comment tags in the HTMLParser.jj, can someone
help me to figure out how to add the 3rd one ???

Thanks a lot.


			

TOKEN :
{
< CommentText1: (~["-"])+ | "-" >
| < CommentEnd1: "-->" > : DEFAULT
}

TOKEN :
{
< CommentText2: (~[">"])+ >
| < CommentEnd2: ">" > : DEFAULT
}



WithinComment3> TOKEN :
{
< CommentText3: ?????? >
| < CommentEnd3: ??????> : DEFAULT
}
   <<Less

Re: summary text for indexed jsp files -- modify the HTMLParser.jj

Posted By:   Timothy_Stone  
Posted On:   Tuesday, September 3, 2002 07:18 AM

Karen,

I'm not a JavaCC hacker either, but here at least is a guess at the solution, if you haven't already solved it...

It appears that the syntax is a regular expression. So...


Token :
{
< CommentText3: (~["%"]){1} >
| < CommentEnd3: ">" > : DEFAULT
}


An alternate syntax might be:


Token :
{
< CommentText3: (~["%"])+ >
| < CommentEnd3: ">" > : DEFAULT
}


In the first example, "{1}" says match only one percent mark. The second example says match one or more percent marks (which is not what you need


Several questions are open still...


  1. Will this work? My initial guess is yes. The intended operation of the syntax is to remove *all* JSP statements, expressions and comments, i.e. any statement beginning with "<%" from Lucene operations on a repository.
  2. Any use of the new, and preferred, JSP XML syntax of course breaks this idea and further examination is needed.
  3. How does one integrate, i.e. compile, the changes into the HTMLParser.jj file? I have never hacked or coded JavaCC before either, so that answer remains to be found in a JavaCC forum or posted as a follow up.
  4. Anything other than Java or JSP syntax in scriptlets may throw the parser out of whack. For example, if you include HTML comments in a scriptlet the parser may choke ending the intended "commented" section to soon. But I'm not sure, because I haven't tested it.


Thoughtfully HTH,

Tim

About | Sitemap | Contact