The well-developed MySQL platform allows anyone to play with a full-featured database just by following a few links and clicking through an install wizard. You can download it at home for free, and have access to the same high end features that many large corporations use every day. This ease of access has resulted in a group of self-taught MySQL users. They are a mixed blessing: They can go on to create interesting and new database-driven applications that provide a great value to the larger community. However, this same auto-didacticism can result in poorly created database schemas and even worse implementations. These are examined by the next group of budding database programmers, thus compounding the issue.
I find myself within such a group. While the majority of my MySQL experience has been at my places of employment, the knowledge gained was in great part due to reading articles, help documentation, and piecing together both SQL general techniques as well as MySQL’s particular implementations as I went along. As a result, I commonly discover improvements to my previous approaches, revealing my older code slow and malformed as time goes along.
One of the most helpful such developments was an understanding (at least partially) of the use of indexes on MySQL tables. The experience of adding an index to aid a query can be a rather amazing experience. A query that took four hours can be reduced to a few minutes or less with the proper addition of a needed index or two.
First, just to be clear: An index on a column is simply as the name implies: an ordered copy of the values in a column (or part of them). They are held in memory, to allow for rapid access. Without an index, searching for a small set of values in a column is like having to search through most of a phonebook just to find one number. What a waste!
In deciding what to index, simply look at the columns that are examined in your common queries’ where clauses, as well as columns used to join tables together. Anytime you need to compare one thing to another, consider whether you can benefit from adding an index. Hopefully your schema has been designed properly enough so that at least some, if not most, of these are the primary keys of the tables you are accessing. But often one or two additional columns emerge per table that commonly need to be used in determining the desired results. Also keep in mind that you don’t want to add too many indexes, as they all take up memory. That same memory that might be better used serving as a cache for some other function.
Once you have selected the columns you want to index, you just use
1 | ALTER TABLE foo ADD INDEX idx_mycolumn (mycolumn); |
right? You could, but you can also do better. It’s best to have the shortest possible index, to save lookup time. By short, I mean you only include part of the value per row in the indexed column in the index. By possible, I mean of a sufficient length to distinguish the majority of the column’s values. Why look through 20 pages of a book’s index when almost all of the things you want to find could have been given in 5?
To find that optimal length for your columns, first find their average length, via something like:
1 | SELECT avg(length(company_name)) FROM my_db.my_table; |
If you have a fairly large number of rows, this might take a while. If it’s too long, you can get approximate values using:
1 2 | SELECT avg(length(company_name)) FROM (SELECT company_name FROM my_db.my_table LIMIT 10000) AS foo; |
This will take a lot less time, but should provide similar values. To put this value in perspective against the actual range of lengths, try some partial lengths. This should reveal where the majority of the uniqueness lies. It’s easy to do via:
1 2 3 4 | SELECT count(DISTINCT company_name), count(DISTINCT LEFT(company_name, 40)), count(DISTINCT LEFT(company_name, 20)) FROM my_db.my_table; |
You might just find that while indexing the full length of the field covers every distinct value, indexing just the first 20 characters retains 90% of the uniqueness! Consider if the field to be indexed is a varchar(255) type. Reducing the index to the first twenty characters takes the length from 255 bytes to 20 per row. Quite a lot less memory used.
Once you have found the optimal length, you can set it for the indexed column. Simply add the number of bytes from the left, just past the column name:
1 | ALTER TABLE my_db.my_table ADD INDEX idx_company_name(company_name (20)); |
One last note on indexes: When you have to perform updates to a large percentage of rows in a table with indexes, as well as when you insert a large number into, or delete a large number from such tables, the indexes can slow you down. For each change to an indexed value, the index has to be updated as well! To get around this, you can try turning keys off just before the change, and enabling them after. Then the index update only happens once, at the end. To try this, use:
1 2 3 | ALTER TABLE my_db.my_table DISABLE KEYS; # Do your updates/deletes/inserts ALTER TABLE my_db.my_table ENABLE KEYS; |
This can often greatly speed up your mass changes, without losing the benefits of indexes in normal reading. An exception to this: If you need to join values in the changed table to other tables to perform the updates, removing the index might do more harm than good. Try them both and compare.
Possibly Related (no promises):
Related posts brought to you by Yet Another Related Posts Plugin.







