Posted By:
Matjaz_Trtnik
Posted On:
Wednesday, June 26, 2002 07:45 AM
Hey! I have problem with our national characters (è È ) when indexing titles or contents of HTML files. I tried output in HTMLDocument but it seems like all these chars have same unicode number which as int is 65533. I tried with this: HTMLParser parser = new HTMLParser(f); doc.add(Field.Text("title", parser.getTitle())); Digging deeper in code I found HTMLParser's constructor public HTMLParser(java.io.InputStream stream). I'm not sure if this has the right support for all Unicode chars. If anyone have solution please let me know. TIA, Matjaz
More>>
Hey!
I have problem with our national characters (è È )
when indexing titles or contents of HTML files.
I tried output in HTMLDocument but it seems like all
these chars have same unicode number which as int is 65533.
I tried with this:
HTMLParser parser = new HTMLParser(f);
doc.add(Field.Text("title", parser.getTitle()));
Digging deeper in code I found HTMLParser's constructor
public HTMLParser(java.io.InputStream stream). I'm not sure if this has the right support for all Unicode chars.
If anyone have solution please let me know.
TIA, Matjaz
<<Less