dcsimg
Quickest way to build a Document - (Keyword, Freq)* map
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Santosh_Dawara
Posted On:   Tuesday, February 11, 2003 08:19 PM

Hi, I am using Lucene right now to index several semi-structured documents. I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword -> frequency from the information already in the lucene index. I currently maintain the lucene index on basis of the keyword -> (document, freq)* mapping. The best solution I could come up with is to iterate over all the keywords ( :( ) match my own document identifier and build the vector. Any ideas/suggestions? Is there a way to speed up the vector computation? It currently takes a |k|*|d| where |k| is the total number of keywords indexed and |d| is the average number of documents a keyword can occur in. Ideally, I   More>>

Hi,

I am using Lucene right now to index several semi-structured documents.

I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword -> frequency from the information already in the lucene index.

I currently maintain the lucene index on basis of the keyword -> (document, freq)* mapping. The best solution I could come up with is to iterate over all the keywords (
:( ) match my own document identifier and build the vector.

Any ideas/suggestions? Is there a way to speed up the vector computation? It currently takes a |k|*|d| where |k| is the total number of keywords indexed and |d| is the average number of documents a keyword can occur in.

Ideally, I would like to have a forward index, document to the pair (keyword, frequency) for this application.

Thank you in advance for you expertise and your time.

Cheers,
Santosh Dawara

Graduate Student
Rochester Instt of Tech

   <<Less
About | Sitemap | Contact