dcsimg
index HTML problem
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Ivy_Liu
Posted On:   Wednesday, December 1, 2004 01:53 PM

Hi all, When I used lucene to index html pages, it shows such errors for some pages: Parse Aborted: Encountered "\'" at line 18, column 27. Was expecting one of: ... "=" ... ... and such errors for some other pages: Parse Aborted: Lexical error at line 18, column 30. Encountered: "=" (61), after : "". and also has such errors for some pages: Parse Aborted: Lexical error at line 40, column 41. Encountered: "u2013" (8211), after : "". I've tried to write my own analyzer in which indicated that "\'" and "\" are stop words, but the errors are still there.    More>>

Hi all,

When I used lucene to index html pages, it shows such errors for some pages:

Parse Aborted: Encountered "\'" at line 18, column 27.
Was expecting one of:
...
"=" ...
...

and such errors for some other pages:

Parse Aborted: Lexical error at line 18, column 30. Encountered: "=" (61), after : "".

and also has such errors for some pages:

Parse Aborted: Lexical error at line 40, column 41. Encountered: "u2013" (8211), after : "".

I've tried to write my own analyzer in which indicated that "\'" and "\" are stop words, but the errors are still there.

Could anyone give me an advice or tell me how to deal with this if you've also met it?

Thanks a lot~~~!!

-Ivy

   <<Less

Re: index HTML problem

Posted By:   Otis_Gospodnetic  
Posted On:   Wednesday, December 1, 2004 08:12 PM

It looks like the error comes from the JavaCC-based HTML parser, not the Analyzer.
About | Sitemap | Contact