How do I convert a String from Unicode to another encoding and vice versa?
Created May 4, 2012
Joe Sam Shirah The basic answer is to use one of the two String constructors that use an encoding argument: String(byte[] bytes, int offset, int length, String enc) or String(byte[] bytes, String enc). Because the encoding is internal and, generally, an encoding translation takes place when writing to most output devices/peripherals/streams, it is difficult to show the results directly.
I have included some code that indirectly, via getBytes(String enc), attempts to show what happens using UTF-16 ( Big Endian Unicode, ) the platform default encoding and UTF-8. The base String contains "Enc" plus the Japanese ideograph "go" or 5. In all cases, on English NT 4.0, the string prints as "Enc?" - with the famous question mark, but the actual byte variation is shown except in the platform default case where 3F ( = '?' ) displays. You can easily change the String contents and encodings to determine other outputs on your platform. See: character encoding and Supported Encodings. Note that only a few encodings are supported if you don't have the international version of the JDK/JRE.
import java.io.*; public class EncString { public static void main(String[] args) { byte[] bRay = null; char quote = '"'; int ndx; String sInitial = "Enc" + "u4E94"; try { bRay = sInitial.getBytes("UTF-16"); } catch( UnsupportedEncodingException uee ) { System.out.println( "Exception: " + uee); } System.out.println( quote + sInitial + quote + " String as UTF-16, " + "bRay length: " + bRay.length + "." ); for( ndx = 0; ndx < bRay.length; ndx++ ) { System.out.print( Integer.toHexString( bRay[ ndx++ ] ) + " " ); System.out.print( Integer.toHexString( bRay[ ndx ] ) + " " ); } System.out.println(" "); OutputStreamWriter osw = new OutputStreamWriter( System.out ); bRay = sInitial.getBytes(); System.out.println( quote + sInitial + quote + " String as platform default - " + osw.getEncoding() + ", bRay length: " + bRay.length + "." ); for( ndx = 0; ndx < bRay.length; ndx++ ) { System.out.print( Integer.toHexString( bRay[ ndx ] ) + " " ); } System.out.println(" "); try { sInitial = new String( sInitial.getBytes("UTF-8"), "UTF-8"); bRay = sInitial.getBytes("UTF-8"); } catch( UnsupportedEncodingException uee ) { System.out.println( "Exception: " + uee); } System.out.println( quote + sInitial + quote + " String as UTF-8, " + "bRay length: " + bRay.length + "." ); for( ndx = 0; ndx < bRay.length; ndx++ ) { System.out.print( Integer.toHexString( bRay[ ndx ] ) + " " ); } } } // End class EncString