BreakIterator bugs and Chinese, or how to detect begin and end of the word.
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Vasja_Sidorov
Posted On:   Thursday, August 15, 2002 02:10 PM

BreakIterator works fine with most encodings, even with Japanese. But seems it is not working correctly with Chinese. Is there some way to detect begin and end of the word in text in Chinese?
Another question, is there easy way using BreakIterator to switch from word detection to character detection for some particular charset? For example, if I use mixed text (chinese and english), to get word if it is in english, or single character if it is in chinese.

Re: BreakIterator bugs and Chinese, or how to detect begin and end of the word.

Posted By:   Christopher_Koenigsberg  
Posted On:   Thursday, August 15, 2002 02:27 PM

Are you assuming that 1 character is one "word", in Chinese? Often true but not necessarily so. I think the definition of a "word" is subjective, for Chinese, depending on context etc.

About | Sitemap | Contact