Per-token attribute data in index for scoring?
0 posts in topic
Flat View  Flat View

Posted By:   Scott_Davies
Posted On:   Saturday, May 20, 2006 01:16 AM

What's the best way to store per-token "side information" in a Lucene index so that it's available at scoring time? For example, it'd be nice if I could dump all tokens from an HTML document into one field, but have a few extra bits associated with each token to say, for example, "this token is in an H2 header, so give it a little more weight for scoring", or "this token is from hyperlink text from a page we don't really trust all that much so give it less weight for scoring", etc. (This is all assuming that I'll be changing the scoring code to pay attention to such attributes, of course; note that I'm a Lucene newbie, so I'm not even sure how easy *that* would be either...)

About | Sitemap | Contact