Posted By:
Santosh_Dawara
Posted On:
Tuesday, February 11, 2003 08:19 PM
Hi, I am using Lucene right now to index several semi-structured documents. I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword -> frequency from the information already in the lucene index. I currently maintain the lucene index on basis of the keyword -> (document, freq)* mapping. The best solution I could come up with is to iterate over all the keywords ( :( ) match my own document identifier and build the vector. Any ideas/suggestions? Is there a way to speed up the vector computation? It currently takes a |k|*|d| where |k| is the total number of keywords indexed and |d| is the average number of documents a keyword can occur in. Ideally, I
More>>
Hi,
I am using Lucene right now to index several semi-structured documents.
I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword -> frequency from the information already in the lucene index.
I currently maintain the lucene index on basis of the keyword -> (document, freq)* mapping. The best solution I could come up with is to iterate over all the keywords (
:( ) match my own document identifier and build the vector.
Any ideas/suggestions? Is there a way to speed up the vector computation? It currently takes a |k|*|d| where |k| is the total number of keywords indexed and |d| is the average number of documents a keyword can occur in.
Ideally, I would like to have a forward index, document to the pair (keyword, frequency) for this application.
Thank you in advance for you expertise and your time.
Cheers,
Santosh Dawara
Graduate Student
Rochester Instt of Tech
<<Less