dcsimg
highlight in the contents of a PDF, Doc...
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Alfred_Sniff
Posted On:   Friday, April 15, 2005 08:38 AM

Hi again, i'm always working on the highlighting of words which are my request's results... Thanks to your answers i've succeed to highlight terms which are fields that i've created in my index... but now, i would like to highlight the words which are IN the document and not only in the description fields... i mean i want the contents words highlight. I want to make two options of this : first, rewrite a part of the contents in HTML and highlight it. The highlighting of a html code is very simple but i don't succeed to catch a part of the content from the index. then i would like to highlight the words in the document itself. If i open the PDF or .Doc document, the words are highlight. I think there's a solution with PDF BOX for PDF but i'   More>>

Hi again,

i'm always working on the highlighting of words which are my request's results... Thanks to your answers i've succeed to highlight terms which are fields that i've created in my index... but now, i would like to highlight the words which are IN the document and not only in the description fields... i mean i want the contents words highlight. I want to make two options of this : first, rewrite a part of the contents in HTML and highlight it. The highlighting of a html code is very simple but i don't succeed to catch a part of the content from the index.

then i would like to highlight the words in the document itself. If i open the PDF or .Doc document, the words are highlight. I think there's a solution with PDF BOX for PDF but i'm not sure. Can someone tell me? And for Microsoft Office documents is there a way to do that? POI?

Thanks fro everything
Best regards

   <<Less

Re: highlight in the contents of a PDF, Doc...

Posted By:   Otis_Gospodnetic  
Posted On:   Friday, April 15, 2005 09:09 PM

Alfred, if I understand your concern correctly, then you first need to parse PDF, Word, and other rich-text documents. Then you need to index this text with Lucene. Erik and I developed a small framework that does just that, and you can see some references to it here.
The code that comes with Lucene in Action book is free, so you can download it from http://www.lucenebook.com/
About | Sitemap | Contact