Lucene: Use different Analyzers for different fields ?
2 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Anonymous
Posted On:   Wednesday, May 25, 2005 05:35 AM

Hello, I'm implementing a multilingual database with Lucene adding documents with fields like: docnumber: D123 Englishtext: o'clock Frenchtext: l'heure classcodes: A01C C02D Is it possible to have different Analyzers for different fields ? Using the same Analyzer results in several problems: Stopwords, indexing of apostrophes and similar, indexing of classcodes (since I want to search for C02D I can't use keyword). I want to be able to search in all fields. If there is no possibility to use different Analy   More>>

Hello,

I'm implementing a multilingual database with Lucene adding documents with fields like:







docnumber: D123
Englishtext: o'clock
Frenchtext: l'heure
classcodes: A01C C02D


Is it possible to have different Analyzers for different fields ?


Using the same Analyzer results in several problems: Stopwords, indexing of apostrophes and similar, indexing of classcodes (since I want to search for C02D I can't use keyword).


I want to be able to search in all fields.


If there is no possibility to use different Analyzers, which alternative solutions are possible ? Change the input text like adding a space after each apostrophe and replacing the numbers by letters (0=>a, 1=>b, ...) like this:







docnumber: Dbcd
Englishtext: o'clock
Frenchtext: l' heure
classcodes: AbcC CbdD
   <<Less

Re: Lucene: Use different Analyzers for different fields ?

Posted By:   Richard_Krenek  
Posted On:   Wednesday, June 1, 2005 08:04 PM

I think this is what you are after org.apache.lucene.analysis.PerFieldAnalyzerWrapper

Re: Lucene: Use different Analyzers for different fields ?

Posted By:   Charles_Sanders  
Posted On:   Thursday, May 26, 2005 08:14 AM

In our Lucene application, we have handled the multi-language problem by creating separate documents for each language. So, if I'm supporting 3 languages, when I index the document, I add the document to the index 3 times, once for each language. The key field in my index is a number that uniquely identifies the document concatenated with the locale (506enUS, 506frFR, 506jpJP). Our application is locale dependent so I always have the language value. This allows me to send the entire document to the analyzer of my choosing based on locale. I have created my own analyzer which is an extension of the standard analyzer. My analyzer requires a locale value so that it can apply the correct stop words for that language.

Just an idea. This has worked well in our application.
About | Sitemap | Contact