Re: Default Encoding Scheme
Posted By:
Thierry_Sourbier
Posted On:
Saturday, August 4, 2001 04:15 AM
Strange behavior but it is logical. In first case, the system thinks it is reading and writing some cp1252. No conversion occurs. It works. In the second scenario, it is reading some gb2312 and you specify a gb2312 output, once again no conversion and you are fine.
The problem occurs when there is a mismatch between what the program thinks it is reading and the output. As Chinese characters cannot be represented in cp1252, and some cp1252 characters cannot be represented in gb2312, they are replaced with question marks.
To make sure that the system correctly reads the input file, you may use native2ascii to convert all the chinese character into escaped Unicode sequence (aka uHHHH).
You can also have a look at:
http://users.erols.com/eepeter/chinesecomputing/programming/java.html
Thierry Sourbier - www.i18ngurus.com