Support of foreign languages in HTMLParser???
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Code_Master
Posted On:   Friday, September 27, 2002 02:16 AM

Hi, I want to include the support for Danish in the HTMLParser of Lucene. Workflow: 1) In the HTMLParser.jj I have added this to a token: < #LET: ["A"-"Z","a"-"z","0"-"9","æ","å","Ø","ø","Å","Æ"] > 2) In the StandarTokenizer.jj I have added "u0080"-"u00ff", "u0100"-"u017f", "u0180"-"u024f", "u00c0"-"u00d6" to the #LETTER: tag When I search (after compiling and indexing) after words with special characters in it, the search engine can't find them. For example: when I search for "c   More>>

Hi,


I want to include the support for Danish in the HTMLParser of Lucene. Workflow:
1) In the HTMLParser.jj I have added this to a token: < #LET: ["A"-"Z","a"-"z","0"-"9","æ","å","Ø","ø","Å","Æ"] >

2) In the StandarTokenizer.jj I have added "u0080"-"u00ff", "u0100"-"u017f", "u0180"-"u024f", "u00c0"-"u00d6" to the #LETTER: tag


When I search (after compiling and indexing) after words with special characters in it, the search engine can't find them. For example: when I search for "civilingeniør", I will get no result. When I write out the comment string of the result (after another search of a word nearby), I see this in my browser: "civilingeniør". So somehow the parser is mapping the wrong characters.. What am I doing wrong?




Thx..

   <<Less
About | Sitemap | Contact