dcsimg
Identifying fields in which hits occur
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Anthony_Frayling
Posted On:   Wednesday, February 16, 2005 04:03 AM

I am currently using Lucene to index complex documents (fielded) and perform complex searches.


When I get the results ( Hits ) there doesn't appear to be any way to identify which fields the hits occurred in, only which document. What I am having to do is post-process the hits and re-parsing the documents fields to identify where the query terms match.

Is there any easy way to identify the fields as well as the documents in which hits have occurred?

Re: Identifying fields in which hits occur

Posted By:   ian_white  
Posted On:   Sunday, March 6, 2005 03:51 AM

I too have this problem. Some of my indexed fields are actually the URL of a Word/PDF/HTML document. My solution is to add info the the key so that when you retrieve the results you can parse out the real key plus the extra info that reveals which fields.

E.g I index the rows of a database table, so I have one document with the key being the Primary key of that table (OID), and the various fields of the document representing the columns of the table.


Some of my columns correspond to URLs that lead to a Word/PDF or HTML external document. So in order to be able to identify whether the search found a match in the database row or in an external document, I can encode the Lucene document key to include the OID plus, say, #fieldName.


There's no reason why you can't have multiple Lucene documents representing a single searchable entity.


NOTE - if someone knows how to extract this info from Lucene's results in the first place, I'd love to hear about it as well.


Ian White

About | Sitemap | Contact