dcsimg
How to detect a file encoding?
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   quan_nguyen
Posted On:   Tuesday, November 19, 2002 09:21 PM

I'm trying to determine the encoding of a text file so that I can use the right encoding for InputStreamReader . The file has no BOM and is in either ANSI or UTF-8 format. If I erroneously use UTF-8 to read in an ANSI file, the text will appear corrupted with a lot of square boxes.

Notepad of Win2K can correctly detect the file encoding even when there is no BOM. How can I do the same, preferably without having to read in the entire file first? Thanks.

Re: How to detect a file encoding?

Posted By:   eimi_nos  
Posted On:   Wednesday, November 20, 2002 05:53 AM

Currently I do not have Windows machine at hand and cannot confirm the exact way of encoding as you refer to as ANSI, which must be specific to Windows platform.


It may be an idea to examine the properties of "ANSI" encoding by

byte[] b = [String].getBytes(); // supposing "ANSI" is default

method and compare with it the byte array obtained by

byte[] b0 = [String].getBytes("UTF8");

and find your own way of determining the encoding of a string.
About | Sitemap | Contact