Lucene Section Index
How can I index Powerpoint documents?
In order to index Powerpoint documents you need to first parse them to extract text that you want to index from them. You can use the Jakarta Apache POI, as it contains a parser for Powerpoint do...more
Where does the name Lucene come from?
Lucene is Doug Cutting's wife's middle name, and her maternal grandmother's first name.
What is the difference between IndexWriter.addIndexes(IndexReader[]) and IndexWriter.addIndexes(Directory[]), besides them taking different arguments?
What is the difference between IndexWriter.addIndexes(IndexReader[]) and
IndexWriter.addIndexes(Directory[]), besides them taking different arguments?
How can I index PDF documents?
In order to index PDF documents you need to first parse them to extract text that you want to index from them. Here are some PDF parsers that can help you with that:
PDFBox is a Java API from Be...more
How can I index XML documents?
In order to index XML documents you need to first parse them to extract text that you want to index from them. Here are some XML parsers that can help you with that:
See XML
Demo. This contrib...more
What are all possible concurrent Lucene requests?
query
read doc
write
delete
optimize
merge
query
Y
Y
Y
Y
Y
Y
read doc
Y
Y
Y
Y
Y
Y
write
Y
...more
Why does IndexReader's maxDoc() return an 'incorrect' number of documents sometimes?
According to the Javadoc for IndexReader maxDoc() method "returns one greater than the largest possible document number".
In other words, the number returned by maxDoc() does not necessarily matc...more
Are Wildcard, Prefix, and Fuzzy queries case sensitive?
Yes, unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are case sensitive.
That is because those types of queries are not passed through the Analyzer, which is the compone...more
How can I index and search digits and other non-alphabetic characters?
The components responsible for this are various Analyzers.
The demos included in Lucene distribution use StopAnalyzer, which filters out non-alphabetic characters.
To include non-alphabetic chara...more
Can Lucene index PDF files?
Lucene can index anything that can be represented as a String.
One can extract text out of PDF files and feed that to Lucene.
See Lucene's Contributions Page for some PDF parsers.
more
Can Lucene do a "search within search", so that the second search is constrained by the results of the first query?
Yes. There are two primary options:
Use QueryFilter with the previous query as the filter.
(you can search the mailing list archives for QueryFilter and Doug Cutting's recommendations against ...more
Can I use Lucene to index text in Chinese, Japanese, Korean, and other multi-byte character sets?
Yes, you can. Lucene is not limited to English, nor any other language. To index text properly, you need to use an Analyzer appropriate for the language of the text you are indexing. Lucene's d...more
Where does the name Lucene come from?
Lucene is Doug Cutting's wife's middle name, and her maternal grandmother's first name.
Can I cache search results with Lucene?
Lucene does come with a simple cache mechanism, if you use Lucene Filters.
The classes to look at are CachingWrapperFilter and QueryFilter.
more
Why can't I use Lucene with IBM JDK 1.3.1?
Apparently there is a bug in IBM's JIT code in JDK 1.3.1.
To work around it, disable JIT for the org.apache.lucene.store.OutputStream.writeInt method by setting the following environment variable:...more