What character set(s) does IE5 use to encode form parameters that are sent to a server?
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Anonymous
Posted On:   Thursday, June 28, 2001 11:02 AM

I have a form with multiple text fields that I submit to a server using IE5. When I call HttpServletRequest.getCharacterEncoding() , I get cp1252. This leads me to believe that all characters for all fields will be 1 byte latin-1 characters. When I send 30 "Box Drawing Characters" through IE5, the server receives 60 bytes for 1 field (A8 67 A8 55 A8 5B A8 61 A8 60 A8 5F A9 B4 A9 B8). It appears to be UTF-8 Unicode, but I am not sure. The problem: IE5 appears to send 1 byte characters for all Latin-1 characters, but unknown characters such as box drawing characters are sent as 2 byte characters. For a multi field form, IE5 appears to mix character set encodings.    More>>

I have a form with multiple text fields that I submit to a server using IE5.

When I call HttpServletRequest.getCharacterEncoding() , I get cp1252.
This leads me to believe that all characters for all fields will be 1 byte latin-1
characters. When I send 30 "Box Drawing Characters" through IE5,
the server receives 60 bytes for 1 field (A8 67 A8 55 A8 5B A8 61 A8 60 A8 5F A9 B4 A9 B8).
It appears to be UTF-8 Unicode, but I am not sure.

The problem: IE5 appears to send 1 byte characters for all Latin-1 characters,
but unknown characters such as box drawing characters are sent as 2 byte characters.
For a multi field form, IE5 appears to mix character set encodings.

Has anyone else seen this?

I tried String converted = new String(request.getParameter("field").getBytes("ISO-8859-1"),"UTF-8"); ,
but that does not appear to work. I still get a 60 byte string.

   <<Less

Re: What character set(s) does IE5 use to encode form parameters that are sent to a server?

Posted By:   Mark_Rose  
Posted On:   Tuesday, August 14, 2001 05:10 PM

Both browsers use whatever encoding the page was sent in.
(Specified in either the Content-Type HTTP header or the
HTML element.)

The characters you are receiving are definitely not UTF-8
encodings of Unicode
values. They are probably the Windows-1252 encodings of
the "box drawing characters," but I'm not sure. UTF-8 encodings
of Unicode values larger than 0x7F always start with a first byte of 0xC0 or
higher.

Suggestions:


  1. specify the encoding of the page using the
    Content-Type or techniques;
  2. include on the form
    a hidden field that includes the encoding (called "charset" below);
  3. use a modified
    version of your technique to get the values:


String converted =
new String(
request.getParameter("field").getBytes("ISO-8859-1"),
request.getParameter("charset")
);


Note: The Servlet 2.3 spec makes this easier--and different--but few containers support it yet.
The code above will work in a Servlet 2.3 container.

About | Sitemap | Contact