(mysql.info) charset-unicode
Info Catalog
(mysql.info) charset-operations
(mysql.info) charset
(mysql.info) charset-metadata
10.7 Unicode Support
====================
MySQL 5.0 supports two character sets for storing Unicode data:
* `ucs2', the UCS-2 Unicode character set.
* `utf8', the UTF-8 encoding of the Unicode character set.
In UCS-2 (binary Unicode representation), every character is
represented by a two-byte Unicode code with the most significant byte
first. For example: `LATIN CAPITAL LETTER A' has the code `0x0041' and
it is stored as a two-byte sequence: `0x00 0x41'. `CYRILLIC SMALL
LETTER YERU' (Unicode `0x044B') is stored as a two-byte sequence: `0x04
0x4B'. For Unicode characters and their codes, please refer to the
Unicode Home Page (http://www.unicode.org/).
Currently, UCS-2 cannot be used as a client character set, which means
that `SET NAMES 'ucs2'' does not work.
The UTF-8 character set (transform Unicode representation) is an
alternative way to store Unicode data. It is implemented according to
RFC 3629. The idea of the UTF-8 character set is that various Unicode
characters are encoded using byte sequences of different lengths:
* Basic Latin letters, digits, and punctuation signs use one byte.
* Most European and Middle East script letters fit into a two-byte
sequence: extended Latin letters (with tilde, macron, acute, grave
and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic,
Syriac, and others.
* Korean, Chinese, and Japanese ideographs use three-byte sequences.
RFC 3629 describes encoding sequences that take from one to four bytes.
Currently, MySQL support for UTF-8 does not include four-byte
sequences. (An older standard for UTF-8 encoding is given by RFC 2279,
which describes UTF-8 sequences that take from one to six bytes. RFC
3629 renders RFC 2279 obsolete; for this reason, sequences with five
and six bytes are no longer used.)
*Tip*: To save space with UTF-8, use `VARCHAR' instead of `CHAR'.
Otherwise, MySQL must reserve three bytes for each character in a `CHAR
CHARACTER SET utf8' column because that is the maximum possible length.
For example, MySQL must reserve 30 bytes for a `CHAR(10) CHARACTER SET
utf8' column.
Info Catalog
(mysql.info) charset-operations
(mysql.info) charset
(mysql.info) charset-metadata
automatically generated byinfo2html