Answer
Lucene can index anything that can be converted to String and fed to it through its API.
See Lucene Contributions for some pointers.
Is this item
helpful? yes no
Previous votes Yes: 0 No: 1
|
|
Comments and alternative answers
 |
 |
Re: i2a websearch
Richard Burton, Dec 14, 2002 [replies:1]
I would use Runtime.getRuntime("pdf2text") and capture the output stream from the forked shell. There is only one issue, and that is pdf2text sometimes extracts binary data from the pdf also. But it extracts all of the text which is what you want (Along with a little bit of trash).
I hope this helps.
Is this item
helpful? yes no
Previous votes Yes: 0 No: 0
|
|

|
 |
 |
 |
Re[2]: i2a websearch
Weldon Sams, Jul 14, 2004
You spoke about pdf2text, could you possibly explain where the Runtime.getRuntime("pdf2text") comes from, or what steps I should take to index a bunch of pdfs with Lucene. I'm very new to Lucene, so I'm not up to speed with everything yet. Also, would you know of any good help documents for Lucene.
Thanks
Is this item
helpful? yes no
Previous votes Yes: 0 No: 0
|
|

|
 |
 |
Re: i2a websearch
Matthieu Casanova, Feb 12, 2004 [replies:1]
You can also use PDF Box a java api for PDF. There is also a class that parse PDF and returns Lucene Document, it works great and it's Free
http://www.pdfbox.org/
Is this item
helpful? yes no
Previous votes Yes: 0 No: 0
|
|

|
 |
 |
 |
Re[2]: i2a websearch
Kalani Ruwanpathirana, Aug 26, 2008
Yes PDFBox works very well with Lucene. I have worked with it. This post shows how to do it. http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html
Is this item
helpful? yes no
Previous votes Yes: 0 No: 0
|
|

|
 |
Try PDFTextStream
Chas Emerick, Sep 7, 2004
PDFTextStream goes one step further than just extracting text from PDF files to be used with Lucene -- it provides a complete set of integration classes that enables a Lucene user to easily add PDF document content to Lucene indexes.
There's a full tutorial and sample code available: PDFTextStream / Lucene Integration
Is this item
helpful? yes no
Previous votes Yes: 0 No: 0
|
|

|
|
|
 |
|