Conversion of Java Strings between different encodings.
2 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Balaji_Thummala
Posted On:   Tuesday, April 24, 2001 11:54 AM

Hi, I want to convert the given input strings to UTF-8 or SJIS. I'm using "u65e5u672cu8a9eu6587u5b57u5217" UCS-2 string. I tried building strings using String strUTF = new String(str.getBytes(), "UTF8"); String strSJIS = new String(str.getBytes(), "SJIS"); where str is the above UCS2 string. I tried to cross-check by reversing the process of moving string back to UCS-2 from UTF-8 String newstr = new String(strUTF.getBytes("UTF8")); and tried to compare with the original string str. To my surprise, two strings are not equal. Can any of you justify this or let me know if I am wrong.    More>>

Hi,




I want to convert the given input strings to UTF-8 or SJIS.

I'm using "u65e5u672cu8a9eu6587u5b57u5217" UCS-2 string.

I tried building strings using String strUTF = new String(str.getBytes(), "UTF8");
String strSJIS = new String(str.getBytes(), "SJIS");
where str is the above UCS2 string.

I tried to cross-check by reversing the process of moving string back to UCS-2
from UTF-8
String newstr = new String(strUTF.getBytes("UTF8"));
and tried to compare with the original string str.
To my surprise, two strings are not equal. Can any of you justify this or let me
know if I am wrong.

   <<Less

Re: Conversion of Java Strings between different encodings.

Posted By:   Alistair_Sheffield  
Posted On:   Thursday, May 17, 2001 09:21 AM

You should certainly NOT use the suggestions given in the answer above - this will not give you a String in any encoding other than UTF16/UCS2 (which is what Java uses internally) - i.e. Unicode.


I have posted numerous answers about this sort of thing on Sun's I18n Forum - take a look there:
http://forum.java.sun.com/list/discuss.sun.internationalization.
You could also look at this topic or this one for some recent discussion on the topic.


Alistair

Re: Conversion of Java Strings between different encodings.

Posted By:   Dave_Stone  
Posted On:   Wednesday, May 2, 2001 01:44 PM

You should have used
String strUTF = new String(str.getBytes("UTF8"), "UTF8"); and
String strSJIS = new String(str.getBytes("SJIS"), "SJIS");

Without encoding specification getBytes() converts internal two-byte character to byte array using system default (most likely USASCII or ISO8859_1).

Also, you should use newstr = new String(strUTF.getBytes("UTF8"), "UTF8").
About | Sitemap | Contact