How do I convert a String from Unicode to another encoding and vice versa?

Joe Sam Shirah

The basic answer is to use one of the two String constructors that use an encoding argument: String(byte[] bytes, int offset, int length, String enc) or String(byte[] bytes, String enc). Because the encoding is internal and, generally, an encoding translation takes place when writing to most output devices/peripherals/streams, it is difficult to show the results directly. I have included some code that indirectly, via getBytes(String enc), attempts to show what happens using UTF-16 ( Big Endian Unicode, ) the platform default encoding and UTF-8. The base String contains "Enc" plus the Japanese ideograph "go" or 5. In all cases, on English NT 4.0, the string prints as "Enc?" - with the famous question mark, but the actual byte variation is shown except in the platform default case where 3F ( = '?' ) displays. You can easily change the String contents and encodings to determine other outputs on your platform. See: character encoding and Supported Encodings. Note that only a few encodings are supported if you don't have the international version of the JDK/JRE.


import java.io.*;

public class EncString
{
   public static void main(String[] args)
   {
      byte[] bRay = null;
      char quote = '"';
      int ndx;
      String sInitial = "Enc" + "u4E94";

      try { bRay = sInitial.getBytes("UTF-16"); }
      catch( UnsupportedEncodingException  uee )
      {
        System.out.println( "Exception: "  + uee);
      }

      System.out.println( quote + sInitial + quote + 
               " String as UTF-16, " + 
                "bRay length: " + bRay.length + "." );
      for( ndx = 0; ndx < bRay.length; ndx++ )
      {
        System.out.print( Integer.toHexString( bRay[ ndx++ ] ) + " " );
        System.out.print( Integer.toHexString( bRay[ ndx ] ) + "   " );
      }

      System.out.println("
");

      OutputStreamWriter osw = new OutputStreamWriter( System.out );

      bRay = sInitial.getBytes();
      System.out.println( quote + sInitial + quote + 
               " String as platform default - " + 
                 osw.getEncoding() + 
                ", bRay length: " + bRay.length + "." );
      for( ndx = 0; ndx < bRay.length; ndx++ )
      {
        System.out.print( Integer.toHexString( bRay[ ndx ] ) + "   " );
      }

      System.out.println("
");

      try 
      {
        sInitial = new String( sInitial.getBytes("UTF-8"), "UTF-8");
        bRay = sInitial.getBytes("UTF-8");
      }
      catch( UnsupportedEncodingException  uee )
      {
        System.out.println( "Exception: "  + uee);
      }

      System.out.println( quote + sInitial + quote + 
               " String as UTF-8, " + 
                "bRay length: " + bRay.length + "." );

      for( ndx = 0; ndx < bRay.length; ndx++ )
      {
        System.out.print( Integer.toHexString( bRay[ ndx ] ) + "   " );
      }

   }
}  // End class EncString

Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

About | Sitemap | Contact