dcsimg
which utf encoding provide support for multiple language?
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Abhishek_Jain
Posted On:   Friday, February 18, 2005 12:29 AM

::PROBLEM:: I have to support multiple languages like chinese, russian, french, german, italian & many more in my software. Kindly assist me in telling which 'utf encoding' would be better (utf-16 or utf-8 or any other)? These are the following observations I found after struggling on net for utf-16 & utf-8 ... 1) UTF-16 represents a large set of characters, (i.e. 215) whereas UTF-8 represents a small set (i.e. 27). 2) UTF-8 is efficient if you use a lot of ASCII, e.g. if you're an English speaker and all you use is ASCII, but it's more bytes per character than UTF-16 for a whole lot of other scripts (plus it's more bytes per character than an lot of current script-specific encodin   More>>

::PROBLEM::
I have to support multiple languages like chinese, russian, french, german, italian & many more in my software. Kindly assist me in telling which 'utf encoding' would be better (utf-16 or utf-8 or any other)?



These are the following observations I found after struggling on net for utf-16 & utf-8 ...



1) UTF-16 represents a large set of characters, (i.e. 215) whereas UTF-8 represents a small set (i.e. 27).



2) UTF-8 is efficient if you use a lot of ASCII, e.g. if you're an English speaker and all you use is ASCII, but it's more bytes per character than UTF-16 for a whole lot of other scripts (plus it's more bytes per character than an lot of current script-specific encodings).



3) UTF-8 was designed far better than UTF-16 when it comes to all aspects of interoperabilty. Thus it should be the preferred encoding for all transport protocols and all interface points between systems from different vendors.



4) All the UTF-16 APIs in Windows and MacOS are a huge barrier to deployment of Unicode on those platforms since all the code has to be rewritten (and most of it never is). If they had instead retro-fitted UTF-8 into the existing 8-bit APIs we'd have much better Unicode deployment.



5) UTF8 is intended for when English is a dominant language, in which case it is more space efficient, or when full compatibility with the ASCII7 standard is a must.



6) The way UTF-8 was designed, old configuration files, shell scripts, and even lots of age-old software can function properly with Unicode text, even though Unicode was invented years after they came to be.



7) Windows also uses UTF-16 internally.



   <<Less
About | Sitemap | Contact