how to create the analyzer for multiple format files??
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   vipin_sharma
Posted On:   Tuesday, September 20, 2005 05:57 AM

hi,

i want to write my own search engine. therefore i want to extract text fields from different files of different formats(for eg: pdf, doc, xls etc..). and then i want to supply that text to the lucene for indexing. how to do this....

plzz help me to come out from this problem.....

Re: how to create the analyzer for multiple format files??

Posted By:   Richard_Krenek  
Posted On:   Wednesday, September 21, 2005 09:51 AM

You may want to check the book "Lucene in Action", it has examples. You do not need an Analyzer you need to first extract the text from your files then pass them through an Analyzer of your choice. For PDFs the pdfbox lib is good. Try a google search with these terms lucene pdf text extract which should get you started. Also take a look at Nutch.
About | Sitemap | Contact