Character Sets and Character Encodings

The most fundamental and foremost requirement of localization is the definition of the character set or alphabet of the language. This includes the basic characters, digits,punctuation marks, currency symbol, special symbols.


After the character set is agreed at linguistic-level, the next step is to assign unique numbers or codes to each characters. Encoding may be done by assigning unique number between 1 –128 ( 7 bit ) or 1- 256 ( 8 bits) to each characters. However 256 number are barely enough to store character of a single language. Consequently, there is separate code page for each language.