dcsimg
Problem working with Japanese characters
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   pradeep_nair
Posted On:   Monday, May 24, 2004 05:40 AM

i am supposed to display some string value in jsp. this string can be either in english or in japanese. if the string is in japanese only the first 19 bytes should be displayed. but if the string is in english then the first 38 bytes should be displayed. Iam using "EUC-JP" character encoding. i tried the following code and it works well for English, but if the japanese characters are more than 19 its not truncated at 19th byte. byte rqcValue[] = tempRqc.getBytes("EUC-JP"); if (rqcValue.length > 38) { strRqc = new String(rqcValue, 0, 38, charSet); } i am told that 1 japanese character takes 2 bytes and 1 english character takes only 1 byte. please hel   More>>

i am supposed to display some string value in jsp. this string can be either in english or in japanese. if the string is in japanese only the first 19 bytes should be displayed. but if the string is in english then the first 38 bytes should be displayed. Iam using "EUC-JP" character encoding. i tried the following code and it works well for English, but if the japanese characters are more than 19 its not truncated at 19th byte.


byte rqcValue[] = tempRqc.getBytes("EUC-JP");


if (rqcValue.length > 38) {

strRqc = new String(rqcValue, 0, 38, charSet);

}




i am told that 1 japanese character takes 2 bytes and 1 english character takes only 1 byte. please help me to solve this situation

   <<Less

Re: Problem working with Japanese characters

Posted By:   Sean_Owen  
Posted On:   Monday, May 24, 2004 07:40 AM

It doesn't quite make sense to say "display the first 19 bytes", since characters are displayed, not bytes. Bytes are simply used to encode the characters. Do you mean "display the first 19 characters"?



The number of bytes that a character requires depends on the character encoding. In most encodings for Japanese characters, yes, 1 character will require more than one byte --but not necessarily 2. For example in UTF-8, some could take 3 bytes.



Are you referring to the fact that some Japanese characters are "wide" when displayed, and may take as much room as two Roman characters? That is a separate issue, and has nothing to do with encoding or bytes.



Why bother with bytes at all? If the String is Japanese, simply display the first 19 characters, and if not, display 38 -- is that what you really want?




String toDisplay = ...;
boolean stringIsJapanese = ...;
if (stringIsJapanese) {
if (toDisplay.length() > 19) {
toDisplay = toDisplay.substring(0, 19);
}
} else {
if (toDisplay.length() > 38) {
toDisplay = toDisplay.substring(0, 38);
}
}
About | Sitemap | Contact