jGuru
Register Email     Password Forgot your
password?
HOME FAQS FORUMS DOWNLOADS ARTICLES PEERSCOPE LEARN

  Search   jGuru Search Help

View:
Q which utf encoding provide support for multiple language?
Topic: I18N
abhishek jain, Feb 18, 2005  [replies:2]


::PROBLEM::

I have to support multiple languages like chinese, russian, french, german, italian & many more in my software. Kindly assist me in telling which 'utf encoding' would be better (utf-16 or utf-8 or any other)?

These are the following observations I found after struggling on net for utf-16 & utf-8 ...

1) UTF-16 represents a large set of characters, (i.e. 215) whereas UTF-8 represents a small set (i.e. 27).

2) UTF-8 is efficient if you use a lot of ASCII, e.g. if you're an English speaker and all you use is ASCII, but it's more bytes per character than UTF-16 for a whole lot of other scripts (plus it's more bytes per character than an lot of current script-specific encodings).

3) UTF-8 was designed far better than UTF-16 when it comes to all aspects of interoperabilty. Thus it should be the preferred encoding for all transport protocols and all interface points between systems from different vendors.

4) All the UTF-16 APIs in Windows and MacOS are a huge barrier to deployment of Unicode on those platforms since all the code has to be rewritten (and most of it never is). If they had instead retro-fitted UTF-8 into the existing 8-bit APIs we'd have much better Unicode deployment.

5) UTF8 is intended for when English is a dominant language, in which case it is more space efficient, or when full compatibility with the ASCII7 standard is a must.

6) The way UTF-8 was designed, old configuration files, shell scripts, and even lots of age-old software can function properly with Unicode text, even though Unicode was invented years after they came to be.

7) Windows also uses UTF-16 internally.



Is this item helpful?  yes  no     Previous votes   Yes: 0  No: 0





Re: which utf encoding provide support for multiple language?
Topic: I18N
Stephen Ostermiller PREMIUM, Feb 18, 2005
I would use UTF-8 because I speak English. ;-)
  1. Both UTF-16 and UTF-8 are byte representations of characters in the unicode character set. Both UTF-16 and UTF-8 can represent all unicode characters. That is the vast majority of languages on earth.
  2. You are correct about efficiency of UTF-8 and UTF-16 for english vs non-english. UTF-8 uses one byte for each ascii character but may use as much as 6 bytes for chinese characters. UTF-16 uses two bytes for each and every character.
  3. UTF-8 was designed to work passably well with programs that generally accept ASCII input. This means that you can use standard unix tools like grep on UTF-8 files. However, such interoperability can come at a price: several security vulnerabilities were introduced by older programs trying to filter text from a UTF-8 file that was no longer ACSII encoded.
  4. It is certainly possible to write unicode aware applications for Windows and Mac these days.
  5. UTF-8 is probably more efficient for almost all european languages, as they are all mostly ASCII.
  6. Old programs can function, but feeding unicode into a program that expects ASCII can introduce bugs including security problems.
  7. Java uses UTF-16 internally. :-)


Is this item helpful?  yes  no     Previous votes   Yes: 0  No: 0


Re: which utf encoding provide support for multiple language?
Topic: I18N
Peter O'Brien PREMIUM, Feb 21, 2005

If by 'support' you mean integration with other systems with different character sets, then UTF-16 is the way to go. Because of the fixed double byte character set it is faster to translate to and from than UTF-8 is.

However, if all you need is to store data in a database and display it on a front end, then UTF-8 would be better because it will use less space. That's both in the database and over the transport protocol.



Is this item helpful?  yes  no     Previous votes   Yes: 1  No: 0




Ask A Question



 

Related Links

I18N FAQ

I18N Forum

Internationalization home

Sun's Internationalization FAQ

Wish List
Features
About jGuru
Contact Us

 



The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers