Indexing, Parsing, Searching JSPs
3 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   william_boyd
Posted On:   Friday, January 25, 2002 04:21 AM

I've hacked the HTMLParser class that comes in the Lucene demo and made it parse and index JSPs. But when i do a search, the jsp tags <%pageContext.setAttribute( "req", request );%> <%@ page import="com.propelnewmedia.tags.BreadcrumbTrailer"%> and so on, are included in the summary. Please, does someone have a working JSP Parser that does not include the jsp tags in the summary? Or, please, can you tell me where I should be looking in HTMLParser.java to fix this? I've already fiddled around with the addToParser() method and the inScript flag with limited success. But I am getting uncomfortably    More>>

I've hacked the HTMLParser class that comes in the Lucene
demo and made it parse and index JSPs. But when i do a
search, the jsp tags

			<%pageContext.setAttribute( "req", request );%>
			

<%@ page import="com.propelnewmedia.tags.BreadcrumbTrailer"%>

and so on, are included in the summary.

Please, does someone have a working JSP Parser that does
not include the jsp tags in the summary? Or, please, can you
tell me where I should be looking in HTMLParser.java to fix this?

I've already fiddled around with the addToParser() method and
the inScript flag with limited success. But I am getting uncomfortably
close to my deadline, and I need a quick fix, Yesterday!

Thanks in advance.

   <<Less

Re: Indexing, Parsing, Searching JSPs

Posted By:   Helen_Huang  
Posted On:   Wednesday, February 20, 2002 07:32 AM

I am trying to index and search jsp pages. Have the same problem. Did you find a solution? What is it?

Thanks!


-Helen

Re: Indexing, Parsing, Searching JSPs

Posted By:   Moshe_Sambol  
Posted On:   Thursday, January 31, 2002 12:59 PM

What you probably want to index is the results of the compiled and run pages, rather than their source code. Rather than reading the JSPs from the local file system you should use a spider to read their *output* from the server, and index that.

Re: Indexing, Parsing, Searching JSPs

Posted By:   william_boyd  
Posted On:   Friday, January 25, 2002 07:16 AM

Well, I've figured out a way to get the JSP tags out of
the summary (and i think out of the index as well).

What I did was designate JSP tags (anything starting with <% and ending with %>)
as a 3rd comment type in the void CommentTag() :,
TOKEN :, and TOKEN :
sections of HTMLParser.jj

I just copied and pasted the relevant code for Comment2 and
mimicked that for my new Comment type. I then recompiled HTMLParser.jj
using javacc which i downloaded from here.

I'm still not out of the woods though. I still need to know how to make Lucene
not include list element values, etc in the search hits. For instance, if a keyword happens to
be in a list, it gets counted as a hit.

Any suggestions would be massively appreciated!. Thanks in advance.

About | Sitemap | Contact