Customizing Analyzer and TokenStream to stop stemming specifc field
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   BG_San
Posted On:   Tuesday, May 18, 2004 01:23 PM

The search engine I am working on is to index lists of sale item listings that include fields like title, description, item ID and seller user ID. The relevancy and word matching using case-insensitivity and stemming in the following Analyzer works well. But I just do not want to tokenize and stem the seller user ID field, which is an alphanumeric, no space string that would need exact phrase matching. I notice that Lucene's IndexWriter only take one analyzer per index. So I cannot index the seller user ID with exact phrase match. I have heard a suggestion that I may create another index with no-stemming and do multi-index search when I need to do a query with the seller ID in the search criteria. But I'm not sure of the performance this m   More>>

The search engine I am working on is to index lists of sale item listings that include fields like title, description, item ID and seller user ID. The relevancy and word matching using case-insensitivity and stemming in the following Analyzer works well. But I just do not want to tokenize and stem the seller user ID field, which is an alphanumeric, no space string that would need exact phrase matching.


I notice that Lucene's IndexWriter only take one analyzer per index. So I cannot index the seller user ID with exact phrase match. I have heard a suggestion that I may create another index with no-stemming and do multi-index search when I need to do a query with the seller ID in the search criteria. But I'm not sure of the performance this multi-index search. What are some good solutions to this problem?

			
Analyzer analyzer = new TextSearchAnalyzer();
IndexWriter writer = new IndexWriter(indexDir, analyzer, newIndex);


			
public class TextSearchAnalyzer extends Analyzer
{
public final TokenStream tokenStream(final Reader reader)
{
TokenStream result = new StandardTokenizer(reader);

result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopTable);
result = new PorterStemFilter(result);

return result;
}
}
   <<Less

Re: Customizing Analyzer and TokenStream to stop stemming specifc field

Posted By:   Otis_Gospodnetic  
Posted On:   Wednesday, May 19, 2004 04:14 AM

Have you seen PerFieldAnalyzerWrapper ?
About | Sitemap | Contact