Friday, March 30, 2001 08:12 PM
Note: This answer was actually provided by Jonathan Asbell
Here is the answer according to Hans Bergsten from Gefion Software http://www.gefionsoftware.com
Author of JavaServer Pages (O'Reilly)
(Also I would like to thank him as he has been very generous in taking the time to help me resolve this problem).
When a browser sends a parameter in some encoding, such as UTF-8, it encodes each character byte value as a hexadecimal string using the encoding for the page (e.g. UTF-8). At the server, however, the part of the container that interprets these character values always assumes they are 8859-1 byte values. So it creates a Unicode string based on the byte
values interpreted as 8859-1. Since the 8859-1 assumption is made by the container, this hack (read "fix") works independently of the platform you run it on.
In the Servlet 2.2 API, the methods that parse parameter input always assume
that it's sent as ISO 8859-1 (i.e. getParameter() et al). So they create a String containing the correct bytes but incorrect charset.
If you know what the charset is, you can convert the bytes to a string using the correct charset:
new String(value.getBytes("8859_1"), "utf-8")
8859-1 is the default encoding of HTTP.