Default Encoding Scheme
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   elisabeth_jolly
Posted On:   Wednesday, August 1, 2001 04:16 AM

I Have a servlet which needs to diplay some content from a chinese site with Gb2312 character set. When i'm using Windows 2000 English, i'm getting it correctly without having the servlet to specify the content Type. But when i'm using Win 2000 Chinese Version, i'll have to specify the content type like hsResponse.setContentType("text/plain;charset=gb2312"); to display the chinese characters.. But if use this in Win 2000 english, i'm getting some unknown characters with '?' symbols. So in both version my servlet showing contradictory behaviour. when i noted the default file encoding property in English is Cp1252 and in Chinese it's GBK.. Pls help me and tell me a way to resolve the proble   More>>


I Have a servlet which needs to diplay some content from a chinese site with Gb2312 character set. When i'm using Windows 2000 English, i'm getting it correctly without having the servlet to specify the content Type. But when i'm using Win 2000 Chinese Version, i'll have to specify the content type like
hsResponse.setContentType("text/plain;charset=gb2312"); to display the chinese characters..
But if use this in Win 2000 english, i'm getting some unknown characters with '?' symbols. So in both version my servlet showing contradictory behaviour. when i noted the default file encoding property in English is Cp1252 and in Chinese it's GBK..




Pls help me and tell me a way to resolve the problem


elisabeth

   <<Less

Re: Default Encoding Scheme

Posted By:   Thierry_Sourbier  
Posted On:   Saturday, August 4, 2001 04:15 AM

Strange behavior but it is logical. In first case, the system thinks it is reading and writing some cp1252. No conversion occurs. It works. In the second scenario, it is reading some gb2312 and you specify a gb2312 output, once again no conversion and you are fine.



The problem occurs when there is a mismatch between what the program thinks it is reading and the output. As Chinese characters cannot be represented in cp1252, and some cp1252 characters cannot be represented in gb2312, they are replaced with question marks.



To make sure that the system correctly reads the input file, you may use native2ascii to convert all the chinese character into escaped Unicode sequence (aka uHHHH).

You can also have a look at:
http://users.erols.com/eepeter/chinesecomputing/programming/java.html


Thierry Sourbier - www.i18ngurus.com
About | Sitemap | Contact