dcsimg
One analyzer for all languages covered by ISO-8859-1 charset?
2 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Niranjan_Sathe
Posted On:   Friday, May 2, 2008 06:33 AM



Can I safely use a common analyzer, e.g. StandardAnalyzer, to create indexes that comprising of text from languages that only use characters in the ISO-8859-1(Latin-1) character set, and be sure that the search will return correct results?


By this I mean that one index will be created (and searched) for the content in the languages that are covered by Latin-1 charset.

Re: One analyzer for all languages covered by ISO-8859-1 charset?

Posted By:   vijay_mareddy  
Posted On:   Saturday, July 12, 2008 03:21 PM

I successfully used a StandardAnalyzer for indexing both japanese and english documents in the same index. But i converted the string to UTF-8 before i actually index the document.
I think Latin-1 supports only the characters in the european languages

Re: One analyzer for all languages covered by ISO-8859-1 charset?

Posted By:   Niranjan_Sathe  
Posted On:   Friday, May 2, 2008 06:46 AM

More information on the original question - I do not need to filter stop words or use a stemmer.
About | Sitemap | Contact