Posted By:   Alfred_Sniff
Posted On:   Wednesday, May 11, 2005 09:53 AM

Hi again,

Can you tell me if there's a way to tell Lucene the separators character to use? For example, if in my document there's the expression : "l'allemagne", if i search for the word "allemagne", my document won't be found because "allemagne" is not considered as a word because of the "'" character. Is there a way to tell lucene that this charcter is a word-separator?

Thanks for answers

Re: Giving a list of words separators characters

Posted By:   Charles_Sanders  
Posted On:   Thursday, May 26, 2005 08:26 AM

You tell Lucene how to create tokens from your string by using the correct analyzer. Choosing the analyzer is one of the most important parts of a Lucene application. There are a number of analyzers included with Lucene and you can create your own. The book "Lucene in Action" discusses this in some detail. I just ran a test on your string "l'allemagne" and the SimpleAnalyzer included with Lucene will create 2 tokens from this string ("l", "allemagne"). I suggest creating a simple application to test the different analyzers to see which one works best for you. You can also you the Luke application, which is available for download at the Lucene home site. It will allow you to see the tokens generated from a string using different analyzers. Hope this helps.
