Re: How to search PDF files in lucene....
Wednesday, July 20, 2005 12:52 PM
http://www.pdfbox.org/ is a jar you will need along with Lucene. If you own the book "Lucene in Action" go to page 235 and it will walk you through step by step (with code) on how to make things work. Basically you have to extract the text from the PDF then index that into Lucene. One thing to be careful of is that Lucene, by default, only indexes the first 10000 terms. If you need more you need to explicetly set that.