Lucene Analyzer that can handle C++ vs C#
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   jack_schlein
Posted On:   Friday, December 11, 2009 01:50 PM

Can someone please point me in the right direction. We are creating an application that needs to beable to search on C++ and get back doc's that have C++ in it. The StandardAnalyzer does not seem to index the "+", so a search for "C++" will bring back docs that contain, C++, C, C#, etc..... The WhiteSpaceAnalyzer will index the "+", but if we have the term "C++." that is, if C++ is at the end of a sentence, it will index "C++." so a search for "C++" will not return the doc. I have heard of maybe a CustomAnalyzer; however, it seems like there would actually need to be a CustomFilter/CustomTokenizer, I looked at: - StandardAnalyz   More>>

Can someone please point me in the right direction.

We are creating an application that needs to beable to search on C++ and get
back doc's that have C++ in it. The StandardAnalyzer does not seem to index
the "+", so a search for "C++" will bring back docs that contain, C++, C,
C#, etc..... The WhiteSpaceAnalyzer will index the "+", but if we have the
term "C++." that is, if C++ is at the end of a sentence, it will index
"C++." so a search for "C++" will not return the doc. I have heard of maybe
a CustomAnalyzer; however, it seems like there would actually need to be a
CustomFilter/CustomTokenizer, I looked at:
- StandardAnalyzer.java
- StandardFilter.java
- StandardTokenizer.java
- StandardTokenizerImpl.java
- StandardTokenizerImpl.jflex

I would guess that the StandardTokenizer is where the changes would need to
be made to allow the "+" character, but I am unclear as to how.

Any and all help is greatly appreciated.

Going thru all the documents, stripping out "+" for the word "plus" is not really an option for us.

   <<Less
About | Sitemap | Contact