Unicode support in Lucene
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Matjaz_Trtnik
Posted On:   Wednesday, June 26, 2002 07:45 AM

Hey! I have problem with our national characters (è š ž È Š Ž) when indexing titles or contents of HTML files. I tried output in HTMLDocument but it seems like all these chars have same unicode number which as int is 65533. I tried with this: HTMLParser parser = new HTMLParser(f); doc.add(Field.Text("title", parser.getTitle())); Digging deeper in code I found HTMLParser's constructor public HTMLParser(java.io.InputStream stream). I'm not sure if this has the right support for all Unicode chars. If anyone have solution please let me know. TIA, Matjaz    More>>

Hey!


I have problem with our national characters (è š ž È Š Ž)
when indexing titles or contents of HTML files.

I tried output in HTMLDocument but it seems like all
these chars have same unicode number which as int is 65533.

I tried with this:


			
HTMLParser parser = new HTMLParser(f);
doc.add(Field.Text("title", parser.getTitle()));

Digging deeper in code I found HTMLParser's constructor
public HTMLParser(java.io.InputStream stream). I'm not sure if this has the right support for all Unicode chars.


If anyone have solution please let me know.


TIA, Matjaz

   <<Less
About | Sitemap | Contact