I am planning to use MySQL version 4.1.22 on a Windows NT server. Most of my tables are InnoDB with numerical field. There are also some varchar (Variable Character Field) and longtext fields where English alpha-numerical text and most of the special characters on keyboard can be entered. During installation of MySQL, I decided on UTF-8 (8-bit UCS/Unicode Transformation Format) as my character set. But is this the ideal charset for my database? What should be the connection collation? the collation for varchar fields in the table? Is it utf8_general_ci? Also, I was wondering if I will be able to use tables with latin1_swedish_ci in the same database. Will using UTF-8 make the index larger in physical size (kBs)?
UTF-8 is ideal for most applications where any special characters must be supported. Set utf8_general_ci for both the connection and the field collations, unless you have a compelling reason to do otherwise.
MySQL will allow a user to choose the character set and collation down to the field level. You could have UTF-8 and latin fields in the same table if you wanted.
Since UTF-8 is a variable length character set, your index will only be larger than a similar Latin index if there are multi-byte characters used. If only single byte characters are stored in the table, than the index will be the same size.
The MySQL manual contains lots of information on the different character sets and collations available and how to convert between them.
Dig Deeper on Linux servers
Related Q&A from Scott Noyes
A MySQL expert describes two ways that the multi-master support can be used. Continue Reading
An expert describes where to find information on detection deadlock algorithms in MySQL. Continue Reading
MySQL expert Scott Noyes explains why using mysqli over mysql_functions allows you to take advantage of MySQL's newer features and PHP scripts. Continue Reading