From its inception, speed has been a strong point of the MySQL database server. In fact, its developers have long been cautious to add new features at the expense of performance even when faced by withering pressure from detractors. Yet over time MySQL's features caught up with its blazing speed, and today its used to power some of the highest traffic websites in the world, Yahoo! Finance, Craiglist, and TicketMaster among them. Accordingly, for most applications chances are MySQL is going to perform to your expectations.
However, with Web traffic exploding, and the complexity of applications increasing all the time, you should strive to optimize your database from its very inception, and continue to review and refine its structure and query activity over its lifetime. You can do so by following a process I refer to as NICM, or preferably, "Neglecting to Index Causes Misery" (those of you new to database optimization will soon get the joke). This process consists of five steps, including Normalize, Index, Cache, and Monitor. In this inaugural installment of my new MySQL series for Developer.com, I'll introduce these five steps, highlighting not only how they will ensure your database is running at full tilt, but also how they will actually help you to more effectively manage your data.
The relational database is aptly named because it promotes the strategy of managing data through well-defined relations. By creating and enforcing relations, its possible to greatly reduce the possibility inconsistencies could creep into the data. This strategy is known as database normalization, of which there are several well-defined states, also known as forms.
To illustrate both the concept and importance of normalization, consider a scenario in which you're tasked with creating a corporate human resources application capable of managing employee information. This application would allow the HR manager to easily insert, update, view and delete employee data. Each employee record would contain the usual information, including name, phone number, position, salary, and supervisor.
Note that in the opening sentence I stated that relational databases promote the strategy of managing data through well-defined relations—you're certainly not constrained to do so. Therefore there's nothing to prevent you from creating a single table that looks like this:
CREATE TABLE employees ( name VARCHAR(50) NOT NULL, telephone CHAR(10) NOT NULL, position VARCHAR(20) NOT NULL, salary DECIMAL(8,2) NOT NULL, supervisor VARCHAR(50) NOT NULL )
Yet several problems with this approach should be evident. Because the data types used to define these columns are all indicative of an input method requiring the manager to manually enter each value. For example, the manager might type the following values into the web form when adding a new employee:
"Jason Gilmore", "614-999-9999", "Software Developer", "50000.00", "John Thompson"
But what if over time the manager begins to enter the "Software Developer" title as "Sft. Dev.", and occasionally forgets the periods, using just "Sft Dev"? These inconsistencies eliminate the possibility of using simple SELECT queries to retrieve the roster of software developers. The simple answer to the problem is to normalize the position data, creating a new table intended to contain the master list of all possible positions:
CREATE TABLE positions ( id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20) NOT NULL )Then you revise the employees table so that it refers to a position's primary key rather than the name (in the revised table, the _fk postfix is simply a naming convention to remind the database administrator that this column points to a foreign table key):
CREATE TABLE employees ( name VARCHAR(50) NOT NULL, telephone CHAR(10) NOT NULL, position_fk TINYINT UNSIGNED NOT NULL, salary DECIMAL(8,2) NOT NULL, supervisor VARCHAR(50) NOT NULL )
Now you can retrieve the positions from the positions table and use them to populate a drop-down list. This forces the manager to choose from a constrained set rather than haphazardly type in the titles.
A similar normalization improvement can be made to the supervisor column, although this is a case where an additional table isn't necessary. All you need to do is modify the employees table anew so it looks like this:
CREATE TABLE employees ( id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) NOT NULL, telephone CHAR(10) NOT NULL, position_fk TINYINT UNSIGNED NOT NULL, salary DECIMAL(8,2) NOT NULL, supervisor_pk SMALLINT UNSIGNED )
Can you figure out the origin of the data used to populate the supervisor_pk column? That's right, the very same employees table!
This is just a taste of how database normalization can greatly reduce the data inconsistencies that can arise over time. For a great summary of the concept and the various forms I alluded to earlier in the section, see the Wikipedia article on this topic.
I also suggest downloading MySQL's free MySQL Query Browser application, which can help you to visualize the data and its various relationships in a manner much more efficient than if you were using the command-line client.