Go to the first, previous, next, last section, table of contents.
Optimization is a complex task because ultimately it requires
understanding of the whole system. While it may be possible to perform some
local optimizations with small knowledge of your system or application,
the more optimal you want your system to become the more you will have
to know about it.
This chapter tries to explain and give some examples of different
ways to optimize MySQL. Remember, however, that there are
always some (increasingly harder) additional ways to make the system
even faster.
The most important factor in making a system fast is the basic
design. You also need to know what kinds of things your system will be
doing, and what your bottlenecks are.
The most common bottlenecks are:
- Disk seeks.
It takes time for the disk to find a piece of data. With modern disks,
the mean time for this is usually lower than 10ms, so we can in
theory do about 100 seeks a second. This time improves slowly with new
disks and is very hard to optimize for a single table. The way to
optimize seek time is to spread the data on more than one disk.
- Disk reading/writing.
When the disk is at the correct position we need to read the data. With
modern disks, one disk delivers at least 10-20MB/s throughput. This
is easier to optimize than seeks because you can read in parallel from
multiple disks.
- CPU cycles.
When we have the data in main memory (or if it was already
there) we need to process it to get to our result. Having small
tables compared to the memory is the most common limiting
factor. But then, with small tables speed is usually not the problem.
- Memory bandwidth.
When the CPU needs more data than can fit in the CPU cache the main
memory bandwidth becomes a bottleneck. This is an uncommon bottleneck
for most systems, but one to be aware of.
When using the MyISAM storage engine, MySQL uses extremely fast table
locking that allows multiple readers or a single writer. The biggest problem
with this storage engine occurs when you have a mix of a steady stream
of updates and slow selects on the same table. If this is a problem with
some tables, you can use another table type for these. See section 15 MySQL Storage Engines and Table Types.
MySQL can work with both transactional and non-transactional tables.
To be able to work smoothly with non-transactional tables (which can't
roll back if something goes wrong), MySQL has the following rules:
-
All columns have default values.
-
If you insert a ``wrong'' value in a column, such as a too-large
numerical value into a numerical column, MySQL will set the column to the ``best possible value''
instead of giving an error.
For numerical values, this is 0, the smallest
possible values or the largest possible value. For strings, this is
either the empty string or the longest possible string that can be in
the column.
-
All calculated expressions returns a value that can be used instead of
signaling an error condition. For example 1/0 returns
NULL
The implication of these rules is that you should not use MySQL to check
column content. Instead, you should check values in the application before
storing them in the database.
For more information about this, see section 1.8.6 How MySQL Deals with Constraints and section 14.1.4 INSERT Syntax.
Because all SQL servers implement different parts of SQL, it takes work to
write portable SQL applications. It is very easy to achieve portability
for very simple selects and inserts, but becomes more difficiult the more
capabilities you require. If you want an application that is fast with
many database systems, it becomes even harder!
To make a complex application portable, you need to choose a number of
SQL servers that it should work with.
You can use the MySQL crash-me program
to find functions,
types, and limits you can use with a selection of database
servers. crash-me tests far from everything possible, but it
is still comprehensive with about 450 things tested.
An example of the type of information crash-me can provide is that
you shouldn't have column names longer than 18 characters
if you want to be able to use Informix or DB2.
For information, visit
http://www.mysql.com/information/crash-me.php.
Both the MySQL benchmarks and crash-me programs are very database
independent. By taking a look at how we have written them, you can get
a feeling for what you have to do to make your own applications database
independent. The programs can be found in the `sql-bench' directory
in the MySQL source distribution. They are written in Perl and use the DBI
database interface. Use of DBI in itself solves part of the portability
problem because it provides database-independent access methods.
See http://www.mysql.com/information/benchmarks.html for the results
from the benchmarks.
As you can see in the results, all database systems have some weak points. That
is, they have different design compromises that lead to different
behavior.
If you strive for database independence, you need to get a good feeling
for each SQL server's bottlenecks. MySQL is very fast in
retrieving and updating records, but will have a problem in mixing slow
readers and writers on the same table. Oracle, on the other hand, has a big
problem when you try to access rows that you have recently updated
(until they are flushed to disk). Transactional databases in general are
not very good at generating summary tables from log tables, as in this
case row locking is almost useless.
To make your application really database-independent, you need to define
an easily extendable interface through which you manipulate your data. As
C++ is available on most systems, it makes sense to use a C++ class-based
interface to the databases.
If you use some feature that is specific to a given database system (such
as the REPLACE statement, which is specific to MySQL), you should
implement the same feature for other SQL servers by coding an alternative
method. Although the alternative may be slower, it will allow the other
servers to perform the same tasks.
With MySQL, you can use the /*! */ syntax to add MySQL-specific
keywords to a query. The code inside /**/ will be treated as a
comment (and ignored) by most other SQL servers.
If high performance is more important than exactness, as in some
Web applications, it is possibile to create an application layer that
caches all results to give you even higher performance. By letting
old results ``expire'' after a while, you can keep the cache reasonably
fresh. This provides a method to handle high load spikes, in which case
you can dynamically increase the cache and set the expiration timeout higher
until things get back to normal.
In this case, the table creation information should contain information
of the initial size of the cache and how often the table should normally
be refreshed.
An alternative to implementing an application cache is to use the MySQL query
cache. By enabling the query cache, the server handles the details of
determining whether a query result can be reused. This simplifies your
application.
See section 5.10 The MySQL Query Cache.
During MySQL initial development, the features of MySQL
were made to fit our largest customer. They handle data warehousing for a
couple of the biggest retailers in Sweden.
From all stores, we get weekly summaries of all bonus card transactions,
and we are expected to provide useful information for the store owners
to help them find how their advertisement campaigns are affecting their
own customers.
The volume of data is quite huge (about 7 million summary transactions
per month), and we have data for 4-10 years that we need to present to
the users. We got weekly requests from our customers, who want to get
``instant'' access to new reports from this data.
We solved this by storing all information per month in compressed
'transaction' tables. We have a set of simple macros (script) that
generates summary tables grouped by different criteria (product group,
customer id, store ...) from the transactional tables. The reports are
Web pages that are dynamically generated by a small Perl script that
parses a Web page, executes the SQL statements in it, and inserts the
results. We would have used PHP or mod_perl instead, but they were
not available at that time.
For graphical data, we wrote a simple tool in C that can process SQL
query results and produce GIF images based on those results. This is also
dynamically executed from the Perl script that parses the Web pages.
In most cases, a new report can be done simply by copying an existing
script and modifying the SQL query in it. In some cases, we will need to
add more fields to an existing summary table or generate a new one, but
this is also quite simple, as we keep all transactions tables on disk.
(Currently we have at least 50GB of transactions tables and 200GB of other
customer data.)
We also let our customers access the summary tables directly with ODBC
so that the advanced users can experiment with the data themselves.
We haven't had any problems handling this with quite modest Sun Ultra
SPARCstation hardware (2x200 Mhz). We recently upgraded one of our servers
to a 2 CPU 400 Mhz UltraSPARC, and we are now planning to start handling
transactions on the product level, which would mean a ten-fold increase
of data. We think we can keep up with this by just adding more disk to
our systems.
We are also experimenting with Intel-Linux to be able to get more CPU
power cheaper. Now that we have the binary portable database format (new
in Version 3.23), we will start to use this for some parts of the application.
Our initial feelings are that Linux will perform much better on
low-to-medium load and Solaris will perform better when you start to get a
high load because of extreme disk IO, but we don't yet have anything
conclusive about this. After some discussion with a Linux kernel
developer, this might be a side effect of Linux allocating so many resources
to the batch job that the interactive performance gets very low. This
makes the machine feel very slow and unresponsive while big batches are
going. Hopefully this will be better handled in future Linux Kernels.
This section should contain a technical description of the MySQL
benchmark suite (and crash-me), but that description is not
written yet. Currently, you can get a good idea of the benchmarks by
looking at the code and results in the `sql-bench' directory in any
MySQL source distribution.
This benchmark suite is meant to tell any user what operations a given
SQL implementation performs well or poorly.
Note that this benchmark is single-threaded, so it measures the minimum
time for the operations performed. We plan to add multi-threaded tests to
the benchmark suite in the future.
The following tables show some comparative benchmark results for several
database servers when accessed through ODBC on a Windows NT 4.0 machine.
| Reading 2000000 rows by index | Seconds | Seconds
|
| mysql | 367 | 249
|
| mysql_odbc | 464 |
|
| db2_odbc | 1206 |
|
| informix_odbc | 121126 |
|
| ms-sql_odbc | 1634 |
|
| oracle_odbc | 20800 |
|
| solid_odbc | 877 |
|
| sybase_odbc | 17614 |
|
| Inserting 350768 rows | Seconds | Seconds
|
| mysql | 381 | 206
|
| mysql_odbc | 619 |
|
| db2_odbc | 3460 |
|
| informix_odbc | 2692 |
|
| ms-sql_odbc | 4012 |
|
| oracle_odbc | 11291 |
|
| solid_odbc | 1801 |
|
| sybase_odbc | 4802 |
|
For the preceding tests, MySQL was run with an index cache size of 8MB.
We have gathered some more benchmark results at
http://www.mysql.com/information/benchmarks.html.
Note that Oracle is not included because they asked to be removed. All
Oracle benchmarks have to be passed by Oracle! We believe that makes
Oracle benchmarks very biased because the above benchmarks are
supposed to show what a standard installation can do for a single
client.
To use the benchmark suite, the following requirements must be satisified:
-
The benchmark suite is provided with MySQL source distributions, so you must
have a source distribution. You can either download a released distribution
from http://www.mysql.com/downloads/, or use the current development
source tree (see section 2.3.3 Installing from the Development Source Tree).
-
The benchmark scripts are written in Perl and use the Perl DBI module to
access database servers, so DBI must be installed. You will also need
the server-specific DBD drivers for each of the servers you want to test.
For example, to test MySQL, PostgreSQL, and DB2, you must have the
DBD::mysql, DBD::Pg, and DBD::DB2 modules installed.
See section 20.6 MySQL Perl API.
The benchmark suite is located in the `sql-bench' directory of MySQL
source distributions.
To run the benchmark tests, change location into that directory and execute
the run-all-tests script:
shell> cd sql-bench
shell> perl run-all-tests --server=server_name
server_name is one of supported servers. To get a list of
all options and supported servers, invoke perl run-all-tests --help.
The crash-me script also is located in the `sql-bench' directory.
crash-me tries to determine what features a database supports and
what its capabilities and limitations are by actually running
queries. For example, it determines:
-
What column types are supported
-
How many indexes are supported
-
What functions are supported
-
How big a query can be
-
How big a
VARCHAR column can be
We can find the results from crash-me for many different database
servers at http://www.mysql.com/information/crash-me.php.
You should definitely benchmark your application and database to find
out where the bottlenecks are. By fixing a bottleneck (or by replacing it
with a ``dummy module'') you can then easily identify the next
bottleneck. Even if the overall performance for your
application currently is acceptable, you should at least make a plan for each
bottleneck, and decide how to solve it if someday you really need the
extra performance.
For an example of portable benchmark programs, look at the MySQL
benchmark suite. See section 7.1.4 The MySQL Benchmark Suite. You
can take any program from this suite and modify it for your needs. By doing
this, you can try different solutions to your problem and test which is really
fastest for you.
Another free benchmark suite is the Open Source Database Benchmark,
available at http://osdb.sourceforge.net/.
It is very common for a problem to occur only when the system is very
heavily loaded. We have had many customers who contact us when they
have a (tested) system in production and have encountered load problems. In
most cases, performance problems turn out to be due to issues of basic
database design (for example, table scans are not good at high load) or
problems with the operating system or libraries. Most of the time, these
problems would be a lot easier to fix if the systems were not
already in production.
To avoid problems like this, you should put some effort into benchmarking
your whole application under the worst possible load! You can use
Super Smack for this. It is available at
http://jeremy.zawodny.com/mysql/super-smack/.
As the name suggests, it can bring your system to its knees if you ask it,
so make sure to use it only on your development systems.
First, one factor that affects all statements: The more complex your permission
setup is, the more overhead you will have.
Using simpler permissions when you issue GRANT statements enables
MySQL to reduce permission-checking overhead when clients execute
statements. For example, if you don't grant any table-level or column-level
privileges, the server need not ever check the contents of the
tables_priv and columns_priv tables. Similarly, if you place
no resource limits on any accounts, the server does not have to perform
resource counting. If you have a very high query volume, it may be worth
the time to use a simplified grant structure to reduce permission-checking
overhead.
If your problem is with some specific MySQL expression or function, you can
use the BENCHMARK() function from the mysql client program
to perform a timing test:
mysql> SELECT BENCHMARK(1000000,1+1);
+------------------------+
| BENCHMARK(1000000,1+1) |
+------------------------+
| 0 |
+------------------------+
1 row in set (0.32 sec)
This result was obtained on a Pentium II 400MHz system. It shows that
MySQL can execute 1,000,000 simple addition expressions in 0.32 seconds
on that system.
All MySQL functions should be very optimized, but there may be
some exceptions. BENCHMARK(loop_count,expression) is a
great tool to find out if this is a problem with your query.
EXPLAIN tbl_name
or EXPLAIN SELECT select_options
The EXPLAIN statement can be used either as a synonym for
DESCRIBE or as a way to obtain information about how MySQL will execute
a SELECT statement:
-
The
EXPLAIN tbl_name syntax is synonymous with DESCRIBE tbl_name
or
SHOW COLUMNS FROM tbl_name.
-
When you precede a
SELECT statement with the keyword EXPLAIN,
MySQL explains how it would process the SELECT, providing
information about how tables are joined and in which order.
This section provides information about the second use of EXPLAIN.
With the help of EXPLAIN, you can see when you must add indexes
to tables to get a faster SELECT that uses indexes to find the
records.
You should frequently run ANALYZE TABLE to update table statistics
such as cardinality of keys which can affect the choices the optimizer
makes. See section 14.5.2.1 ANALYZE TABLE Syntax.
You can also see whether the optimizer joins the tables in an optimal
order. To force the optimizer to use a specific join order for a
SELECT statement, add a STRAIGHT_JOIN clause.
For single-table joins, EXPLAIN returns a row of information for each
table used in the SELECT statement. The tables are listed in the output
in the order that MySQL would read them while processing the query.
MySQL resolves all joins using a single-sweep
multi-join method. This means that MySQL reads a row from the first
table, then finds a matching row in the second table, then in the third table
and so on. When all tables are processed, it outputs the selected columns and
backtracks through the table list until a table is found for which there are
more matching rows. The next row is read from this table and the process
continues with the next table.
In MySQL version 4.1, the EXPLAIN output format was changed to work
better with constructs such as UNION statements, subqueries, and
derived tables. Most notable is the addition of two new columns: id
and select_type. You will not see these columns when using servers
older than MySQL 4.1.
Each output row from EXPLAIN provides information about one table, and
each row consists of the following columns:
id
-
SELECT identifier, the sequential number of this SELECT
within the query.
select_type
-
The type of
SELECT clause, which can be any of the following:
SIMPLE
-
Simple
SELECT (not using UNION or subqueries)
PRIMARY
-
Outermost
SELECT
UNION
-
Second and further
SELECT statements in a UNION
DEPENDENT UNION
-
Second and further
SELECT statements in a UNION, dependent on outer
subquery
SUBQUERY
-
First
SELECT in subquery
DEPENDENT SUBQUERY
-
First
SELECT in subquery, dependent on outer subquery
DERIVED
-
Derived table
SELECT (subquery in FROM clause)
table
-
The table to which the row of output refers.
type
-
The join type. The different join types are listed here, ordered from the
best type to the worst:
system
-
The table has only one row (= system table). This is a special case of
the
const join type.
const
-
The table has at most one matching row, which will be read at the start
of the query. Because there is only one row, values from the column in
this row can be regarded as constants by the rest of the
optimizer.
const tables are very fast as they are read only once!
const is used when you compare all parts of a
PRIMARY KEY or UNIQUE index with constants:
SELECT * FROM const_table WHERE primary_key=1;
SELECT * FROM const_table
WHERE primary_key_part1=1 AND primary_key_part2=2;
eq_ref
-
One row will be read from this table for each combination of rows from
the previous tables. Other than the
const types, this is the best
possible join type. It is used when all parts of an index are used by
the join and the index is a PRIMARY KEY or UNIQUE index.
eq_ref can be used for indexed columns that are compared using the
= operator. The compared item may be a constant or an expression
that uses columns from tables that are read before this table.
In the following examples, ref_table will be able to use eq_ref:
SELECT * FROM ref_table,other_table
WHERE ref_table.key_column=other_table.column;
SELECT * FROM ref_table,other_table
WHERE ref_table.key_column_part1=other_table.column
AND ref_table.key_column_part2=1;
ref
-
All rows with matching index values will be read from this table for
each combination of rows from the previous tables.
ref is used
if the join uses only a leftmost prefix of the key, or if the key is not
a PRIMARY KEY or UNIQUE index (in other words, if the join
cannot select a single row based on the key value). If the key that is
used matches only a few rows, this is a good join type.
ref can be used for indexed columns that are compared using the =
operator.
In the following examples, ref_table will be able to use ref:
SELECT * FROM ref_table WHERE key_column=expr;
SELECT * FROM ref_table,other_table
WHERE ref_table.key_column=other_table.column;
SELECT * FROM ref_table,other_table
WHERE ref_table.key_column_part1=other_table.column
AND ref_table.key_column_part2=1;
ref_or_null
-
This join type is
like
ref, but with the addition that MySQL will do an extra search for
rows that contain NULL values.
In the following example, ref_table will be able to use
ref_or_null:
SELECT * FROM ref_table
WHERE key_column=expr OR key_column IS NULL;
This join type optimization is new for MySQL 4.1.1 and is mostly used when
resolving subqueries.
See section 7.2.6 How MySQL Optimizes IS NULL.
index_merge
-
This join type indicates that the
Merge Index
optimization is used. For more information, see
section 7.2.5 How MySQL Optimizes OR Clauses.
In this case, the key column contains a list of
used indexes, and key_len contains a list of the longest key parts
for the used indexes.
unique_subquery
-
This type replaces
ref for some IN subqueries of the following
form:
value IN (SELECT primary_key
FROM single_table WHERE some_exp)
unique_subquery is just an index lookup function that replaces the
subquery completely for better efficiency.
index_subquery
-
This join type is similar to
unique_subquery. It replaces IN subqueries, but
it works for non-unique indexes in subqueries of the following form:
value IN (SELECT key_field
FROM single_table WHERE some_exp)
range
-
Only rows that are in a given range will be retrieved, using an index to
select the rows. The
key column indicates which index is used.
The key_len contains the longest key part that was used.
The ref column will be NULL for this type.
range can be used for when an key column is compared to a
constant using any of the =, <>, >, >=, <,
<=, IS NULL, <=>, BETWEEN, or IN operators:
SELECT * FROM range_table
WHERE key_column = 10;
SELECT * FROM range_table
WHERE key_column BETWEEN 10 and 20;
SELECT * FROM range_table
WHERE key_column IN (10,20,30);
SELECT * FROM range_table
WHERE key_part1= 10 and key_part2 IN (10,20,30);
index
-
This join type is the same as
ALL, except that only the index tree
is scanned. This is usually faster than ALL, because the index
file is usually smaller than the datafile.
MySQL can use this join type when the query uses only columns that are
part of a single index.
ALL
-
A full table scan will be done for each combination of rows from the
previous tables. This is normally not good if the table is the first
table not marked
const, and usually very bad in all other
cases. Normally, you can avoid ALL by adding indexes that allow row
retrieval from the table based on constant values or column values from
earlier tables.
possible_keys
-
The
possible_keys column indicates which indexes MySQL could use to
find the rows in this table. Note that this column is totally independent of
the order of the tables as displayed in the output from EXPLAIN. That
means that some of the keys in possible_keys may not be usable in
practice with the generated table order.
If this column is NULL, there are no relevant indexes. In this case,
you may be able to improve the performance of your query by examining
the WHERE clause to see whether it refers to some column or columns
that would be suitable for indexing. If so, create an appropriate index
and check the query with EXPLAIN again.
See section 14.2.2 ALTER TABLE Syntax.
To see what indexes a table has, use SHOW INDEX FROM tbl_name.
key
-
The
key column indicates the key (index) that MySQL actually decided
to use. The key is NULL if no index was chosen. To force MySQL
to use or ignore an index listed in the possible_keys column, use
FORCE INDEX, USE INDEX, or IGNORE INDEX in your query.
See section 14.1.7 SELECT Syntax.
Running ANALYZE TABLE or myisamchk --analyze on the table
will help the optimizer choose better indexes.
See section 14.5.2.1 ANALYZE TABLE Syntax and
section 5.6.2.1 myisamchk Invocation Syntax.
key_len
-
The
key_len column indicates the length of the key that
MySQL decided to use. The length is NULL if the
key is NULL. Note that this tells us how many parts of a
multi-part key MySQL will actually use.
ref
-
The
ref column shows which columns or constants are used with the
key to select rows from the table.
rows
-
The
rows column indicates the number of rows MySQL
believes it must examine to execute the query.
Extra
-
This column contains additional information about how MySQL will
resolve the query. Here is an explanation of the different text
strings that can be found in this column:
Distinct
-
MySQL will stop searching for more rows for the current row
combination after it has found the first matching row.
Not exists
-
MySQL was able to do a
LEFT JOIN optimization on the
query and will not examine more rows in this table for the previous row
combination after it finds one row that matches the LEFT JOIN criteria.
Here is an example of the type of query that can be optimized this way:
SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id
WHERE t2.id IS NULL;
Assume that t2.id is defined with NOT NULL. In this case,
MySQL will scan t1 and look up the rows in t2 using the values
of t1.id. If MySQL finds a matching row in t2, it knows that
t2.id can never be NULL, and will not scan through the rest
of the rows in t2 that have the same id value. In other
words, for each row in t1, MySQL only needs to do a single lookup
in t2, regardless of how many rows actually match in t2.
range checked for each record (index map: #)
-
MySQL found no good index to use. Instead, for each row combination in
the preceding tables, it will do a check to determine which index to use
(if any), and use it to retrieve the rows from the table. This is not
very fast, but is faster than performing a join with no index at all.
Using filesort
-
MySQL will need to do an extra pass to find out how to retrieve
the rows in sorted order. The sort is done by going through all rows
according to the
join type and storing the sort key and pointer to
the row for all rows that match the WHERE clause. The keys then are
sorted and the rows are retrieved in sorted order.
Using index
-
The column information is retrieved from the table using only
information in the index tree without having to do an additional seek to
read the actual row. This strategy can be used when the query uses only
columns that are part of a single index.
Using temporary
-
To resolve the query, MySQL will need to create a temporary table to hold
the result. This typically happens if the query contains
GROUP BY
and ORDER BY clauses that list columns differently.
Using where
-
A
WHERE clause will be used to restrict which rows will be matched
against the next table or sent to the client. Unless you specifically intend
to fetch or examine all rows from the table, you may have something wrong
in your query if you don't have this information and the table join type
is ALL or index.
If you want to make your queries as fast as possible, you should look out for
Using filesort and Using temporary.
You can get a good indication of how good a join is by multiplying all values
in the rows column of the EXPLAIN output. This should tell you
roughly how many rows MySQL must examine to execute the query. This
number is also used when you restrict queries with the max_join_size
variable.
See section 7.5.2 Tuning Server Parameters.
The following example shows how a JOIN can be optimized progressively
using the information provided by EXPLAIN.
Suppose that you have the SELECT statement shown here and you plan to
examine it using EXPLAIN:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
tt.ProjectReference, tt.EstimatedShipDate,
tt.ActualShipDate, tt.ClientID,
tt.ServiceCodes, tt.RepetitiveID,
tt.CurrentProcess, tt.CurrentDPPerson,
tt.RecordVolume, tt.DPPrinted, et.COUNTRY,
et_1.COUNTRY, do.CUSTNAME
FROM tt, et, et AS et_1, do
WHERE tt.SubmitTime IS NULL
AND tt.ActualPC = et.EMPLOYID
AND tt.AssignedPC = et_1.EMPLOYID
AND tt.ClientID = do.CUSTNMBR;
For this example, make the following assumptions:
-
The columns being compared have been declared as follows:
| Table | Column | Column Type
|
tt | ActualPC | CHAR(10)
|
tt | AssignedPC | CHAR(10)
|
tt | ClientID | CHAR(10)
|
et | EMPLOYID | CHAR(15)
|
do | CUSTNMBR | CHAR(15)
|
-
The tables have the indexes shown here:
| Table | Index
|
tt | ActualPC
|
tt | AssignedPC
|
tt | ClientID
|
et | EMPLOYID (primary key)
|
do | CUSTNMBR (primary key)
|
-
The
tt.ActualPC values aren't evenly distributed.
Initially, before any optimizations have been performed, the EXPLAIN
statement produces the following information:
table type possible_keys key key_len ref rows Extra
et ALL PRIMARY NULL NULL NULL 74
do ALL PRIMARY NULL NULL NULL 2135
et_1 ALL PRIMARY NULL NULL NULL 74
tt ALL AssignedPC, NULL NULL NULL 3872
ClientID,
ActualPC
range checked for each record (key map: 35)
Because type is ALL for each table, this output indicates
that MySQL is generating a Cartesian product of all the tables; that is,
every combination of rows. This will take quite a long time, because the
product of the number of rows in each table must be examined. For the case
at hand, this product is 74 * 2135 * 74 * 3872 = 45,268,558,720 rows.
If the tables were bigger, you can only imagine how long it would take.
One problem here is that MySQL can't (yet) use indexes on columns
efficiently if they are declared differently. In this context,
VARCHAR and CHAR are the same unless they are declared as
different lengths. Because tt.ActualPC is declared as CHAR(10)
and et.EMPLOYID is declared as CHAR(15), there is a length
mismatch.
To fix this disparity between column lengths, use ALTER TABLE to
lengthen ActualPC from 10 characters to 15 characters:
mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);
Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15).
Executing the EXPLAIN statement again produces this result:
table type possible_keys key key_len ref rows Extra
tt ALL AssignedPC, NULL NULL NULL 3872 Using
ClientID, where
ActualPC
do ALL PRIMARY NULL NULL NULL 2135
range checked for each record (key map: 1)
et_1 ALL PRIMARY NULL NULL NULL 74
range checked for each record (key map: 1)
et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
This is not perfect, but is much better: The product of the rows
values is now less by a factor of 74. This version is executed in a couple
of seconds.
A second alteration can be made to eliminate the column length mismatches
for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID =
do.CUSTNMBR comparisons:
mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15),
-> MODIFY ClientID VARCHAR(15);
Now EXPLAIN produces the output shown here:
table type possible_keys key key_len ref rows Extra
et ALL PRIMARY NULL NULL NULL 74
tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using
ClientID, where
ActualPC
et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1
do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
This is almost as good as it can get.
The remaining problem is that, by default, MySQL assumes that values
in the tt.ActualPC column are evenly distributed, and that isn't the
case for the tt table. Fortunately, it is easy to tell MySQL
about this:
mysql> ANALYZE TABLE tt;
Now the join is perfect, and EXPLAIN produces this result:
table type possible_keys key key_len ref rows Extra
tt ALL AssignedPC NULL NULL NULL 3872 Using
ClientID, where
ActualPC
et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1
do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1
Note that the rows column in the output from EXPLAIN is an
educated guess from the MySQL join optimizer. You should check whether the
numbers are even close to the truth. If not, you may get better performance
by using STRAIGHT_JOIN in your SELECT statement and trying
to list the tables in a different order in the FROM clause.
In most cases, you can estimate the performance by counting disk seeks.
For small tables, you can usually find a row in one disk seek (as the
index is probably cached). For bigger tables, you can estimate that
(using B-tree indexes) you will need this many seeks to find a row:
log(row_count) /
log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) +
1
In MySQL, an index block is usually 1024 bytes and the data
pointer is usually 4 bytes. A 500,000 row table with an
index length of 3 bytes (medium integer) gives you:
log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.
This index would require storage of about 500,000 * 7 * 3/2 = 5.2M
(assuming a typical index buffer fill ration of 2/3), so
you will probably have much of the index in memory and you will probably
need only one or two calls to read data to find the row.
For writes, however, you will need four seek requests (as above) to find
where to place the new index and normally two seeks to update the index
and write the row.
Note that the preceding discussion doesn't mean that your application
will slowly degenerate by log N! As long as everything is cached by the
OS or SQL server, things will go only marginally slower while the table
gets bigger. After the data gets too big to be cached, things will start
to go much slower until your applications is only bound by disk-seeks
(which increase by log N). To avoid this, increase the index cache as the
data grows. See section 7.5.2 Tuning Server Parameters.
In general, when you want to make a slow SELECT ... WHERE query
faster, the first thing to check is whether you can add an index.
All references between different tables should usually be done with
indexes. You can use the EXPLAIN statement to determine which
indexes are used for a SELECT.
See section 7.4.5 How MySQL Uses Indexes and
section 7.2.1 EXPLAIN Syntax (Get Information About a SELECT).
Some general tips for speeding up queries:
-
To help MySQL optimize queries better, use
ANALYZE TABLE or run myisamchk
--analyze on a table after it has been loaded with data. This
updates a value for each index part that indicates the average number of
rows that have the same value. (For unique indexes, this is always 1.)
MySQL will use this to decide which index to
choose when you join two tables based on a non-constant expression.
You can check the result from the table analysis by using SHOW
INDEX FROM tbl_name and examining the Cardinality value.
For MyISAM tables, myisamchk --description --verbose also
shows index distribution information.
-
To sort an index and data according to an index, use
myisamchk
--sort-index --sort-records=1 (if you want to sort on index 1). This is
a good way to make queries faster if you have a unique index from which
you want to read all records in order according to the index. Note that
it may take a long time the first time you sort a large table this way.
The WHERE optimizations are put in the SELECT part here because
they are mostly used with SELECT, but the same optimizations apply for
WHERE in DELETE and UPDATE statements.
Note that this section is incomplete. MySQL does many
optimizations, and we have not had time to document them all.
Some of the optimizations performed by MySQL are listed here:
-
Removal of unnecessary parentheses:
((a AND b) AND c OR (((a AND b) AND (c AND d))))
-> (a AND b AND c) OR (a AND b AND c AND d)
-
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
-
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
-
Constant expressions used by indexes are evaluated only once.
-
COUNT(*) on a single table without a WHERE is retrieved
directly from the table information for MyISAM and HEAP tables.
This is also done for any NOT NULL expression when used with only one
table.
-
Early detection of invalid constant expressions. MySQL quickly
detects that some
SELECT statements are impossible and returns no rows.
-
HAVING is merged with WHERE if you don't use GROUP BY
or group functions (COUNT(), MIN()...).
-
For each table in a join, a simpler
WHERE is constructed to get a fast
WHERE evaluation for the table and also to skip records as
soon as possible.
-
All constant tables are read first, before any other tables in the query.
A constant table is any of the following:
-
An empty table or a table with one row.
-
A table that is used with a
WHERE clause on a PRIMARY KEY
or a UNIQUE index, where all index parts are compared to constant
expressions and are defined as NOT NULL.
All the following tables are used as constant tables:
mysql> SELECT * FROM t WHERE primary_key=1;
mysql> SELECT * FROM t1,t2
-> WHERE t1.primary_key=1 AND t2.primary_key=t1.id;
-
The best join combination for joining the tables is found by trying all
possibilities. If all columns in
ORDER BY and in GROUP
BY come from the same table, then this table is preferred first when
joining.
-
If there is an
ORDER BY clause and a different GROUP BY
clause, or if the ORDER BY or GROUP BY contains columns
from tables other than the first table in the join queue, a temporary
table is created.
-
If you use
SQL_SMALL_RESULT, MySQL will use an in-memory
temporary table.
-
Each table index is queried, and the best index that spans fewer than 30% of
the rows is used. If no such index can be found, a quick table scan is used.
-
In some cases, MySQL can read rows from the index without even
consulting the datafile. If all columns used from the index are numeric,
only the index tree is used to resolve the query.
-
Before each record is output, those that do not match the
HAVING clause
are skipped.
Some examples of queries that are very fast:
mysql> SELECT COUNT(*) FROM tbl_name;
mysql> SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;
mysql> SELECT MAX(key_part2) FROM tbl_name
-> WHERE key_part_1=constant;
mysql> SELECT ... FROM tbl_name
-> ORDER BY key_part1,key_part2,... LIMIT 10;
mysql> SELECT ... FROM tbl_name
-> ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10;
The following queries are resolved using only the index tree, assuming
the indexed columns are numeric:
mysql> SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;
mysql> SELECT COUNT(*) FROM tbl_name
-> WHERE key_part1=val1 AND key_part2=val2;
mysql> SELECT key_part2 FROM tbl_name GROUP BY key_part1;
The following queries use indexing to retrieve the rows in sorted
order without a separate sorting pass:
mysql> SELECT ... FROM tbl_name
-> ORDER BY key_part1,key_part2,... ;
mysql> SELECT ... FROM tbl_name
-> ORDER BY key_part1 DESC,key_part2 DESC,... ;
The Merge Index method is used to retrieve rows with several
ref, ref_or_null, or range scans and merge the
results into one.
This method is employed when the table condition is a disjunction of
conditions for which ref, ref_or_null, or range
could be used with different keys.
In EXPLAIN output, this method appears as index_merge in the
type column. In this case, the key column contains a list of
used indexes, and key_len contains a list of the longest key parts
for the used indexes.
Examples:
SELECT * FROM table WHERE key_col1 = 10 OR key_col2 = 20;
SELECT * FROM table
WHERE (key_col1 = 10 OR key_col2 = 20) AND nonkey_col=30;
SELECT * FROM t1,t2
WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE 'value%')
AND t2.key1=t1.some_col
SELECT * FROM t1,t2
WHERE t1.key1=1
AND (t2.key1=t1.some_col OR t2.key2=t1.some_col2)
This ``join'' type optimization is new in MySQL 5.0.0, and represents
a significant change in behavior with regard to indexes, because the
old rule was that the server is only ever able to use at most
one index for each referenced table.
MySQL can do the same optimization on col_name IS NULL that it can do
with col_name = constant_value. For example, MySQL can use
indexes and ranges to search for NULL with IS NULL.
SELECT * FROM tbl_name WHERE key_col IS NULL;
SELECT * FROM tbl_name WHERE key_col <=> NULL;
SELECT * FROM tbl_name
WHERE key_col=const1 OR key_col=const2 OR key_col IS NULL;
If a WHERE clause includes a col_name IS NULL condition for a
column that is declared as NOT NULL, that expression will be
optimized away. This optimization does not occur in cases when the column
might produce NULL anyway; for example, if it comes from a table on
the right side of a LEFT JOIN.
MySQL 4.1.1 can additionally optimize the combination col_name =
expr AND col_name IS NULL, a form that is common in resolved subqueries.
EXPLAIN will show ref_or_null when this
optimization is used.
This optimization can handle one IS NULL for any key part.
Some examples of queries that are optimized, assuming that there is an index
on t2 (a,b):
SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL;
SELECT * FROM t1,t2 WHERE t1.a=t2.a OR t2.a IS NULL;
SELECT * FROM t1,t2
WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;
SELECT * FROM t1,t2
WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);
SELECT * FROM t1,t2
WHERE (t1.a=t2.a AND t2.a IS NULL AND ...)
OR (t1.a=t2.a AND t2.a IS NULL AND ...);
ref_or_null works by first doing a read on the reference key,
and after that a separate search for rows with a NULL key value.
Note that the optimization can only handle one IS NULL level.
SELECT * FROM t1,t2
WHERE (t1.a=t2.a AND t2.a IS NULL)
OR (t1.b=t2.b AND t2.b IS NULL);
In the preceding case, MySQL will only use key lookups on the part
(t1.a=t2.a AND t2.a IS NULL) and not be able to use the key part on
b.
DISTINCT combined with ORDER BY will
need a temporary table in many cases.
Note that because DISTINCT may use GROUP BY, you should be
aware of how MySQL works with in fields in ORDER BY or HAVING
that are not part of the selected fields. See section 13.9.3 GROUP BY with Hidden Fields.
When combining LIMIT row_count with DISTINCT, MySQL will stop
as soon as it finds row_count unique rows.
If you don't use columns from all tables named in a query, MySQL will stop
the scanning of the not used tables as soon as it has found the first match.
SELECT DISTINCT t1.a FROM t1,t2 where t1.a=t2.a;
In this case, assuming t1 is used before t2 (check with
EXPLAIN), then MySQL will stop reading from t2 (for that
particular row in t1) when the first row in t2 is found.
A LEFT JOIN B join_condition in MySQL is implemented as follows:
-
The table
B is set to be dependent on table A and all tables
on which A is dependent.
-
The table
A is set to be dependent on all tables (except B)
that are used in the LEFT JOIN condition.
-
The
LEFT JOIN condition is used to decide how we should retrieve rows
from table B. (In other words, any condition in the WHERE clause
is not used).
-
All standard join optimizations are done, with the exception that a table is
always read after all tables it is dependent on. If there is a circular
dependence, MySQL issues an error.
-
All standard
WHERE optimizations are done.
-
If there is a row in
A that matches the WHERE clause, but there
is no row in B that matches the ON condition,
an extra B row is generated with all columns set to NULL.
-
If you use
LEFT JOIN to find rows that don't exist in some
table and you have the following test: col_name IS NULL in the
WHERE part, where col_name is a column that is declared as
NOT NULL, then MySQL will stop searching for more rows
(for a particular key combination) after it has found one row that
matches the LEFT JOIN condition.
RIGHT JOIN is implemented analogously to LEFT JOIN.
The join optimizer calculates the order in which tables should be joined.
The table read order forced by LEFT JOIN and STRAIGHT JOIN
helps the join optimizer do its work much more quickly, because there are
fewer table permutations to check.
Note that this means that if you do a query of the following type,
MySQL will do a full scan on b as the LEFT JOIN will force
it to be read before d:
SELECT * FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key)
WHERE b.key=d.key
The fix in this case is to change the query to:
SELECT * FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key)
WHERE b.key=d.key
Starting from 4.0.14, MySQL does the following LEFT JOIN optimization:
If the WHERE condition is always false for the generated
NULL row, the LEFT JOIN is changed to a normal join.
For example, the WHERE clause would be
false in the following query
if t2.column would be NULL:
SELECT * FROM t1 LEFT JOIN t2 ON (column) WHERE t2.column2=5;
Therefore, it's safe to convert the query to a normal join:
SELECT * FROM t1,t2 WHERE t2.column2=5 AND t1.column=t2.column;
This can be made faster as MySQL can now use table t2 before
table t1 if this would result in a better query plan. To force a
specific table order, use STRAIGHT JOIN.
In some cases, MySQL can uses index to satisfy an ORDER BY or
GROUP BY request without doing any extra sorting.
The index can also be used even if the ORDER BY doesn't match the
index exactly, as long as all the unused index parts and all the extra
are ORDER BY columns are constants in the WHERE
clause. The following queries will use the index to resolve the
ORDER BY / GROUP BY part:
SELECT * FROM t1 ORDER BY key_part1,key_part2,...
SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2
SELECT * FROM t1 WHERE key_part1=constant GROUP BY key_part2
SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 DESC
SELECT * FROM t1
WHERE key_part1=1 ORDER BY key_part1 DESC,key_part2 DESC
In some cases, MySQL can not use indexes to resolve the ORDER
BY (note that MySQL will still use indexes to find the rows that
match the WHERE clause):
-
You use
ORDER BY on different keys:
SELECT * FROM t1 ORDER BY key1,key2;
-
You use
ORDER BY on non-consecutive key parts:
SELECT * FROM t1 WHERE key2=constant ORDER BY key_part2;
-
You mix
ASC and DESC:
SELECT * FROM t1 ORDER BY key_part1 DESC,key_part2 ASC;
-
The key used to fetch the rows is not the same as the one used in
the
ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1
-
You are joining many tables and the columns in the
ORDER
BY are not all from the first not-constant table that is used to
retrieve rows. (This is the first table in the EXPLAIN output that
doesn't have a const join type.)
-
You have different
ORDER BY and GROUP BY expressions.
-
The type of table index used doesn't store rows in order. For example, this
is true for a
HASH index in a HEAP table.
In those cases where MySQL must sort the result, it uses the following
algorithm:
-
Read all rows according to key or by table scanning.
Rows that don't match the
WHERE clause are skipped.
-
Store the sort key value in a buffer. The size of the buffer is the value of
sort_buffer_size.
-
When the buffer gets full, run a qsort on it and store the result in a
temporary file. Save a pointer to the sorted block. (If all rows fit into
the sort buffer, no temporary file is created.)
-
Repeat the preceding steps until all rows have been read.
-
Do a multi-merge of up to
MERGEBUFF (7) regions to one block in
another temporary file. Repeat until all blocks from the first file
are in the second file.
-
Repeat the following until there are fewer than
MERGEBUFF2 (15)
blocks left.
-
On the last multi-merge, only the pointer to the row (the last part of
the sort key) is written to a result file.
-
Read the rows in sorted order by using the row pointers in the result file.
To optimize this, we read in a big block of row pointers, sort them, and use
them to read the rows in sorted order into a row buffer The size of the
buffer is the value of (
read_rnd_buffer_size. The code for this
step is in the `sql/records.cc' source file.
With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use
indexes to resolve the query. It cannot if you see Using filesort in
the Extra column.
See section 7.2.1 EXPLAIN Syntax (Get Information About a SELECT).
If you want to increase ORDER BY speed, first see whether you can get
MySQL to use indexes instead of using an extra sorting phase. If this is not
possible, you can try the following strategies:
-
Increase the size of the
sort_buffer_size variable.
-
Increase the size of the
read_rnd_buffer_size variable.
-
Change
tmpdir to point to a dedicated disk with lots of empty space.
If you use MySQL 4.1 or later, this option accepts several paths that are
used in round-robin fashion. Paths should be separated by colon characters
(`:') on Unix and semicolon characters (`;') on Windows, NetWare,
and OS/2. You can use this feature to spread load across several
directories. Note: The paths should be for directories in
filesystems that are on different physical disks, not different
partitions of the same disk.
By default, MySQL sorts all GROUP BY x,y,... queries as if you
specified ORDER BY x,y,... in the query as well. If you include the
ORDER BY clause explicitly that contains the same column list, MySQL
optimizes it away without any speed penalty, though the sorting still
occurs. If a query includes GROUP BY but you want to avoid the
overhead of sorting the result, you can supress sorting by specifying
ORDER BY NULL:
INSERT INTO foo
SELECT a,COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
In some cases, MySQL will handle the query differently when you are
using LIMIT row_count and not using HAVING:
-
If you are selecting only a few rows with
LIMIT, MySQL
uses indexes in some cases when it normally would prefer to do a
full table scan.
-
If you use
LIMIT row_count with ORDER BY, MySQL ends the
sorting as soon as it has found the first row_count lines, rather
than sorting the whole table.
-
When combining
LIMIT row_count with DISTINCT, MySQL stops
as soon as it finds row_count unique rows.
-
In some cases, a
GROUP BY can be resolved by reading the key in order
(or doing a sort on the key) and then calculating summaries until the
key value changes. In this case, LIMIT row_count will not calculate any
unnecessary GROUP BY values.
-
As soon as MySQL has sent the first
# rows to the client, it
aborts the query unless you are using SQL_CALC_FOUND_ROWS.
-
LIMIT 0 always quickly returns an empty set. This is useful
to check the query or to get the column types of the result columns.
-
When the server uses temporary tables to resolve the query, the
LIMIT row_count is used to calculate how much space is required.
The time to insert a record is determined by the following factors,
where the numbers indicate approximately proportions:
-
Connecting: (3)
-
Sending query to server: (2)
-
Parsing query: (2)
-
Inserting record: (1 x size of record)
-
Inserting indexes: (1 x number of indexes)
-
Closing: (1)
This does not take into consideration the initial overhead to open tables,
which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log N
(assuming B-tree indexes).
You can use the following methods to speed up inserts:
-
If you are inserting many rows from the same client at the same time, use
INSERT statements with multiple VALUES lists to insert several
rows at a time. This is much faster (many times faster in some cases) than
using separate single-row INSERT statements. If you are adding data
to non-empty table, you may tune up the bulk_insert_buffer_size
variable to make it even faster.
See section 5.2.3 Server System Variables.
-
If you are inserting a lot of rows from different clients, you can get
higher speed by using the
INSERT DELAYED statement. See section 14.1.4 INSERT Syntax.
-
Note that with
MyISAM tables you can insert rows at the same time
SELECT statements are running if there are no deleted rows in the
tables.
-
When loading a table from a text file, use
LOAD DATA INFILE. This
is usually 20 times faster than using a lot of INSERT statements.
See section 14.1.5 LOAD DATA INFILE Syntax.
-
It is possible with some extra work to make
LOAD DATA INFILE run even
faster when the table has many indexes. Use the following procedure:
-
Optionally create the table with
CREATE TABLE.
-
Execute a
FLUSH TABLES statement or a mysqladmin flush-tables
command.
-
Use
myisamchk --keys-used=0 -rq /path/to/db/tbl_name. This will
remove all use of all indexes for the table.
-
Insert data into the table with
LOAD DATA INFILE. This will not
update any indexes and will therefore be very fast.
-
If you are going to only read the table in the future, use
myisampack
to make it smaller. See section 15.1.3.3 Compressed Table Characteristics.
-
Re-create the indexes with
myisamchk -r -q
/path/to/db/tbl_name. This will create the index tree in memory before
writing it to disk, which is much faster because it avoids lots of disk
seeks. The resulting index tree is also perfectly balanced.
-
Execute a
FLUSH TABLES statement or a mysqladmin flush-tables
command.
Note that LOAD DATA INFILE also performs the preceding optimization
if you insert into an empty table; the main difference is that you can let
myisamchk allocate much more temporary memory for the index creation
that you might want the server to allocate for index re-creation when it
executes the LOAD DATA INFILE statement.
As of MySQL 4.0, you can also use
ALTER TABLE tbl_name DISABLE KEYS instead of
myisamchk --keys-used=0 -rq /path/to/db/tbl_name and
ALTER TABLE tbl_name ENABLE KEYS instead of
myisamchk -r -q /path/to/db/tbl_name. This way you can also skip
FLUSH TABLES steps.
-
You can speed up multiple-statement
INSERT operations that are done
with multiple statements by locking your tables:
mysql> LOCK TABLES a WRITE;
mysql> INSERT INTO a VALUES (1,23),(2,34),(4,33);
mysql> INSERT INTO a VALUES (8,26),(6,29);
mysql> UNLOCK TABLES;
A performance benefit occurs because the index buffer is flushed to disk only
once, after all INSERT statements have completed. Normally there would
be as many index buffer flushes as there are different INSERT
statements. Locking is not needed if you can insert all rows with a single
statement.
For transactional tables, you should use BEGIN/COMMIT instead of
LOCK TABLES to get a speedup.
Locking also lowers the total time of multiple-connection tests, but the
maximum wait time for individual threads might go up because they wait for
locks. For example:
Thread 1 does 1000 inserts
Threads 2, 3, and 4 do 1 insert
Thread 5 does 1000 inserts
If you don't use locking, 2, 3, and 4 will finish before 1 and 5. If you
use locking, 2, 3, and 4 probably will not finish before 1 or 5, but the
total time should be about 40% faster.
INSERT, UPDATE, and DELETE operations are very
fast in MySQL, but you will obtain better overall performance by
adding locks around everything that does more than about 5 inserts or
updates in a row. If you do very many inserts in a row, you could do a
LOCK TABLES followed by an UNLOCK TABLES once in a while
(about each 1000 rows) to allow other threads access to the table. This
would still result in a nice performance gain.
INSERT is still much slower for loading data than LOAD DATA
INFILE, even when using the strategies just outlined.
-
To get some more speed for both
LOAD DATA INFILE and
INSERT, enlarge the key buffer. See section 7.5.2 Tuning Server Parameters.
Update queries are optimized as a SELECT query with the additional
overhead of a write. The speed of the write is dependent on the size of
the data that is being updated and the number of indexes that are
updated. Indexes that are not changed will not be updated.
Also, another way to get fast updates is to delay updates and then do
many updates in a row later. Doing many updates in a row is much quicker
than doing one at a time if you lock the table.
Note that for a MyISAM table that uses dynamic record format,
updating a record to a longer total length may split the record. If you do
this often, it is very important to use OPTIMIZE TABLE occasionally.
See section 14.5.2.5 OPTIMIZE TABLE Syntax.
The time to delete individual records is exactly proportional to the number
of indexes. To delete records more quickly, you can increase the size of the
key cache.
See section 7.5.2 Tuning Server Parameters.
If you want to delete all rows in the table, use TRUNCATE TABLE
tbl_name rather than DELETE FROM tbl_name.
See section 14.1.9 TRUNCATE Syntax.
This section lists a number of miscellaneous tips for improving query
processing speeed:
You can find a discussion about different locking methods in the appendix.
See section D.4 Locking methods.
Except for InnoDB and BDB storage engines, All locking in
MySQL is deadlock-free for storage engines that use table-level locking.
This include the MyISAM, MEMORY (HEAP), and ISAM
engines. Deadlock avoidance is managed by always requesting all needed locks
at once at the beginning of a query and always locking the tables in the
same order.
InnoDB uses row locks and BDB uses page locks. For the
InnoDB and BDB storage engines, deadlock is possible. This is
because InnoDB automatically acquires row locks and BDB
acquires page locks during the processing of SQL statements, not at the
start of the transaction.
The locking method MySQL uses for WRITE locks works as follows:
-
If there are no locks on the table, put a write lock on it.
-
Otherwise, put the lock request in the write lock queue.
The locking method MySQL uses for READ locks works as follows:
-
If there are no write locks on the table, put a read lock on it.
-
Otherwise, put the lock request in the read lock queue.
When a lock is released, the lock is made available to the threads
in the write lock queue, then to the threads in the read lock queue.
This means that if you have many updates for a table, SELECT
statements will wait until there are no more updates.
To work around this for the case where you want to do many INSERT and
SELECT operations on a table, you can insert rows in a temporary
table and update the real table with the records from the temporary table
once in a while.
This can be done with the following code:
mysql> LOCK TABLES real_table WRITE, insert_table WRITE;
mysql> INSERT INTO real_table SELECT * FROM insert_table;
mysql> TRUNCATE TABLE insert_table;
mysql> UNLOCK TABLES;
You can use the LOW_PRIORITY options with INSERT,
UPDATE or DELETE or HIGH_PRIORITY with
SELECT if you want to prioritize retrieval in some specific
cases. You can also start mysqld with --low-priority-updates
to get the same behavior.
Using SQL_BUFFER_RESULT can also help to make the duration of table
locks shorter.
See section 14.1.7 SELECT Syntax.
You could also change the locking code in `mysys/thr_lock.c' to use a
single queue. In this case, write locks and read locks would have the same
priority, which might help some applications.
The table locking code in MySQL is deadlock free.
To achieve a very high lock speed, MySQL uses table locking (instead of
page, row, or column locking) for all storage engines except InnoDB
and BDB. For large tables, table locking is much better than row
locking for most applications, but there are some pitfalls.
For InnoDB and BDB tables, MySQL only uses table
locking if you explicitly lock the table with LOCK TABLES.
For these table types we recommend you to not use
LOCK TABLES at all, because InnoDB uses automatic
row level locking and BDB uses page level locking to
ensure transaction isolation.
As of MySQL Version 3.23.7 (3.23.25 for Windows), you can insert rows into a
MyISAM table at the same time other threads are reading from it.
Note that currently this works at the time the insert is made only if there
are no holes resulting from rows having been deleted from the middle of the
table. When all holes have been filled with new data, concurrent inserts
are re-enabled automatically.
Table locking enables many threads to read from a table at the same
time, but if a thread wants to write to a table, it must first get
exclusive access. During the update, all other threads that want to
access this particular table must wait until the update is done.
Table updates normally are considered to be more important than table
retrievals, so they are given higher priority. This should ensure that
updates to a table are not 'starved' even if there is heavy SELECT
activity for the table. You can change this behavior by using
LOW_PRIORITY with update statements or HIGH_PRIORITY with
SELECT statements.)
Starting from MySQL Version 3.23.7, you can use the
max_write_lock_count variable to force MySQL to temporarily elevate
the priority of all SELECT statements that are waiting for a table,
after a specific number of inserts to the table occur.
Table locking causes problems in cases such as when
a thread is waiting because the disk is full and free space needs to become
available before the thread can proceed. In this case, all
threads that want to access the problem table will also be put in a waiting
state until more disk space is made available.
Table locking is also disadvantageous under the following scenario:
-
A client issues a
SELECT that takes a long time to run.
-
Another client then issues an
UPDATE on a used table. This client
will wait until the SELECT is finished.
-
Another client issues another
SELECT statement on the same table. As
UPDATE has higher priority than SELECT, this SELECT
will wait for the UPDATE to finish. It will also wait for the first
SELECT to finish!
The following list describes some ways to avoid or reduce contention caused by
table locking:
-
Try to get the
SELECT statements to run faster. You might have to
create some summary tables to do this.
-
Start
mysqld with --low-priority-updates. This will give
all statements that update (modify) a table lower priority than SELECT
statements. In this case, the second SELECT statement in the previous
scenario would execute before the INSERT statement, and would not need
to wait for the first SELECT to finish.
-
You can give a specific
INSERT, UPDATE, or DELETE
statement lower priority with the LOW_PRIORITY attribute.
-
Start
mysqld with a low value for max_write_lock_count to allow
READ locks after a certain number of WRITE locks.
-
You can specify that all updates issued by a specific thread should be done with
low priority by using the SQL statement:
SET LOW_PRIORITY_UPDATES=1.
See section 14.5.3.1 SET Syntax.
-
You can specify that a specific
SELECT is very important with the
HIGH_PRIORITY attribute. See section 14.1.7 SELECT Syntax.
-
If you have problems with
INSERT combined with SELECT, switch
to use the MyISAM tables, which support concurrent SELECT and
INSERT statements.
-
If you mainly mix
INSERT and SELECT statements, the
DELAYED attribute to INSERT will probably solve your problems.
See section 14.1.4 INSERT Syntax.
-
If you have problems with mixed
SELECT and DELETE statements,
the LIMIT option to DELETE may help.
See section 14.1.1 DELETE Syntax.
MySQL keeps row data and index data in separate files. Many (almost all)
other databases mix row and index data in the same file. We believe that the
MySQL choice is better for a very wide range of modern systems.
Another way to store the row data is to keep the information for each
column in a separate area (examples are SDBM and Focus). This will cause a
performance hit for every query that accesses more than one column. Because
this degenerates so quickly when more than one column is accessed,
we believe that this model is not good for general purpose databases.
The more common case is that the index and data are stored together
(as in Oracle/Sybase et al). In this case, you will find the row
information at the leaf page of the index. The good thing with this
layout is that it, in many cases, depending on how well the index is
cached, saves a disk read. The bad things with this layout are:
-
Table scanning is much slower because you have to read through the indexes
to get at the data.
-
You can't use only the index table to retrieve data for a query.
-
You lose a lot of space, as you must duplicate indexes from the nodes
(as you can't store the row in the nodes).
-
Deletes will degenerate the table over time (as indexes in nodes are
usually not updated on delete).
-
It's harder to cache only the index data.
One of the most basic optimizations is to design your tables to take as
little space on the disk as possible. This can give huge improvements
because disk reads are faster, and smaller tables normally require less main
memory while their contents are being actively processed during query
execution. Indexing also is a smaller resource burden if done on smaller
columns.
MySQL supports a lot of different table types and row formats. For each
table, you can decide which storage/index method to use. Choosing the
right table format for your application may give you a big performance gain.
See section 15 MySQL Storage Engines and Table Types.
You can get better performance on a table and minimize storage space
using the techniques listed here:
-
Use the most efficient (smallest) datatypes possible. MySQL has
many specialized types that save disk space and memory.
-
Use the smaller integer types if possible to get smaller tables. For
example,
MEDIUMINT is often better than INT.
-
Declare columns to be
NOT NULL if possible. It makes everything
faster and you save one bit per column. If you really need
NULL in your application, you should definitely use it. Just avoid
having it on all columns by default.
-
For
MyISAM tables, if you don't have any variable-length columns
(VARCHAR, TEXT, or BLOB columns), a fixed-size record
format is used. This is faster but unfortunately may waste some space.
See section 15.1.3 MyISAM Table Storage Formats.
-
The primary index of a table should be as short as possible. This makes
identification of one row easy and efficient.
-
Create only the indexes that you really need. Indexes are good for
retrieval but bad when you need to store things fast. If you mostly
access a table by searching on a combination of columns, make an index
on them. The first index part should be the most used column. If you are
always using many columns, you should use the column with more
duplicates first to get better compression of the index.
-
If it's very likely that a column has a unique prefix on the first number of
characters, it's better to only index this prefix. MySQL supports an index
on the leftmost part of a character column. Shorter indexes are faster not
only because they take less disk space, but also because they will give you
more hits in the index cache and thus fewer disk seeks.
See section 7.5.2 Tuning Server Parameters.
-
In some circumstances, it can be beneficial to split into two a table that is
scanned very often. This is especially true if it is a dynamic
format table and it is possible to use a smaller static format table that
can be used to find the relevant rows when scanning the table.
All MySQL column types can be indexed. Use of indexes on the
relevant columns is the best way to improve the performance of SELECT
operations.
The maximum number of indexes per table and the maximum index length is
defined per storage engine. See section 15 MySQL Storage Engines and Table Types. All storage engines support at
least 16 indexes per table and a total index length of at least 256 bytes.
Most storage engines have higher limits.
For CHAR and VARCHAR columns, you can index a prefix of a
column. This is much faster and requires less disk space than indexing the
whole column. The syntax to use in the CREATE TABLE statement to
index a column prefix looks like this:
INDEX index_name (col_name(length))
The example here creates an index for the first 10 characters of the
name column:
mysql> CREATE TABLE test (
-> name CHAR(200) NOT NULL,
-> INDEX index_name (name(10)));
BLOB and TEXT columns can be indexed as well, but for these
types an index prefix is mandatory, not optional. The prefix may be up to
255 bytes long.
As of MySQL Version 3.23.23, you can also create FULLTEXT indexes.
They are used for full-text search. Only the MyISAM table type
supports FULLTEXT indexes and only for CHAR, VARCHAR,
and TEXT columns. Indexing always happens over the entire column and
partial (prefix) indexing is not supported. See section 13.6 Full-text Search Functions for
details.
As of MySQL Version 4.1.0, you can create indexes on spatial column types.
Currently, spatial types are supported only by the MyISAM storage
engine. Spatial indexes use R-trees.
The MEMORY (HEAP) storage engine supports hash indexes.
As of MySQL Version 4.1.0, the engine also supports B-tree indexes.
MySQL can create indexes on multiple columns. An index may consist of up to
15 columns. For certain column types, you can index a prefix of the column
(see section 7.4.3 Column Indexes).
A multiple-column index can be considered a sorted array containing values
that are created by concatenating the values of the indexed columns.
MySQL uses multiple-column indexes in such a way that queries are
fast when you specify a known quantity for the first column of the index in a
WHERE clause, even if you don't specify values for the other columns.
Suppose that a table has the following specification:
mysql> CREATE TABLE test (
-> id INT NOT NULL,
-> last_name CHAR(30) NOT NULL,
-> first_name CHAR(30) NOT NULL,
-> PRIMARY KEY (id),
-> INDEX name (last_name,first_name));
The name index is an index over last_name and
first_name. The index can used for queries that specify
values in a known range for last_name, or for both last_name
and first_name.
Therefore, the name index will be used in the following queries:
mysql> SELECT * FROM test WHERE last_name='Widenius';
mysql> SELECT * FROM test
-> WHERE last_name='Widenius' AND first_name='Michael';
mysql> SELECT * FROM test
-> WHERE last_name='Widenius'
-> AND (first_name='Michael' OR first_name='Monty');
mysql> SELECT * FROM test
-> WHERE last_name='Widenius'
-> AND first_name >='M' AND first_name < 'N';
However, the name index will not be used in the following
queries:
mysql> SELECT * FROM test WHERE first_name='Michael';
mysql> SELECT * FROM test
-> WHERE last_name='Widenius' OR first_name='Michael';
The manner in which MySQL uses indexes to improve query performance is
discussed further in the next section.
Indexes are used to find rows with specific column values
fast. Without an index, MySQL has to start with the first record
and then read through the whole table to find the relevant
rows. The bigger the table, the more this costs. If the table has an index
for the columns in question, MySQL can quickly determine the position to
seek to in the middle of the datafile without having to look at all the
data. If a table has 1000 rows, this is at least 100 times faster than
reading sequentially. Note that if you need to access almost all 1000
rows, it is faster to read sequentially, because that minimizes disk seeks.
Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and
FULLTEXT) are stored in B-trees. Exceptions are that indexes on
spatial column types use R-trees, and MEMORY (HEAP) tables
support hash indexes.
Strings are automatically prefix-
and end-space compressed. See section 14.2.4 CREATE INDEX Syntax.
In general, indexes are used as described in the following discussion.
Characteristics specific to hash indexes (as used in MEMORY tables)
are described at the end of this section.
-
To quickly find the rows that match a
WHERE clause.
-
To eliminate rows from consideration. If there is a choice between multiple
indexes, MySQL normally uses the index that finds the smallest number of
rows.
-
To retrieve rows from other tables when performing joins.
-
To find the
MIN() or MAX() value for a specific indexed column
key_col. This is optimized by a preprocessor that checks if you are
using WHERE key_part_# = constant on all key parts that occur before
key_col in the index. In this case, MySQL will do a single key
lookup for each MIN() or MAX() expression and replace it
with a constant. If all expressions are replaced with constants, the
query will return at once:
SELECT MIN(key_part2),MAX(key_part2)
FROM tbl_name where key_part1=10;
-
To sort or group a table if the sorting or grouping is done on a leftmost
prefix of a usable key (for example,
ORDER BY key_part_1,key_part_2
). If all key parts are followed by DESC, the key is read in
reverse order.
See section 7.2.9 How MySQL Optimizes ORDER BY.
-
In some cases, a query can be optimized to retrieve values without
consulting the data rows. If a query uses only columns from a table that are
numeric and that form a leftmost prefix for some key, the selected values
may be retrieved from the index tree for greater speed:
SELECT key_part3 FROM tbl_name WHERE key_part1=1
Suppose that you issue the following SELECT statement:
mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
If a multiple-column index exists on col1 and col2, the
appropriate rows can be fetched directly. If separate single-column
indexes exist on col1 and col2, the optimizer tries to
find the most restrictive index by deciding which index will find fewer
rows and using that index to fetch the rows.
If the table has a multiple-column index, any leftmost prefix of the
index can be used by the optimizer to find rows. For example, if you
have a three-column index on (col1, col2, col3), you have indexed
search capabilities on (col1), (col1, col2), and
(col1, col2, col3).
MySQL can't use a partial index if the columns don't form a
leftmost prefix of the index. Suppose that you have the SELECT
statements shown here:
mysql> SELECT * FROM tbl_name WHERE col1=val1;
mysql> SELECT * FROM tbl_name WHERE col2=val2;
mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first of the preceding
queries uses the index. The second and third queries do involve
indexed columns, but (col2) and (col2, col3) are not
leftmost prefixes of (col1, col2, col3).
An index is used for columns that you compare with the =, >,
>=, <, <=, or BETWEEN operators.
MySQL also uses indexes for LIKE comparisons if the argument
to LIKE is a constant string that doesn't start with a wildcard
character. For example, the following SELECT statements use indexes:
mysql> SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
mysql> SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';
In the first statement, only rows with 'Patrick' <= key_col <
'Patricl' are considered. In the second statement, only rows with
'Pat' <= key_col < 'Pau' are considered.
The following SELECT statements will not use indexes:
mysql> SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
mysql> SELECT * FROM tbl_name WHERE key_col LIKE other_col;
In the first statement, the LIKE value begins with a wildcard
character. In the second statement, the LIKE value is not a
constant.
MySQL 4.0 and up performs an additional LIKE optimization. If you use
... LIKE '%string%' and string is longer than 3 characters,
MySQL will use the Turbo Boyer-Moore algorithm to initialize the
pattern for the string and then use this pattern to perform the search
quicker.
Searching using col_name IS NULL will use indexes if col_name
is indexed.
Any index that doesn't span all AND levels in the WHERE clause
is not used to optimize the query. In other words, to be able to use an
index, a prefix of the index must be used in every AND group.
The following WHERE clauses use indexes:
... WHERE index_part1=1 AND index_part2=2 AND other_column=3
/* index = 1 OR index = 2 */
... WHERE index=1 OR A=10 AND index=2
/* optimized like "index_part1='hello'" */
... WHERE index_part1='hello' AND index_part_3=5
/* Can use index on index1 but not on index2 or index 3 */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
These WHERE clauses do not use indexes:
/* index_part_1 is not used */
... WHERE index_part2=1 AND index_part3=2
/* Index is not used in both AND parts */
... WHERE index=1 OR A=10
/* No index spans all rows */
... WHERE index_part1=1 OR index_part2=10
Sometimes MySQL will not use an index, even if one is available. One way
this occurs is when the optimizer estimates that using the index would
require MySQL to access more than 30% of the rows in the table. (In this
case, a table scan is probably much faster, because it will require many fewer
seeks.) However, if such a query uses LIMIT to only retrieve part of
the rows, MySQL will use an index anyway, because it can much more quickly
find the few rows to return in the result.
Hash indexes have somewhat different characteristics than those just
discussed:
-
They are used only for
= or <=> comparisons (but are
VERY fast).
-
The optimizer cannot use a hash index to speed up
ORDER BY
operations. (The index cannot be used to search for the next entry in order.)
-
MySQL cannot determine approximately how many rows there
are between two values (this is used by the range optimizer to decide which
index to use). This may affect some queries if you change a
MyISAM
table to a MEMORY table.
-
Only whole keys can be used to search for a row. (With a B-tree index,
any prefix of the key can be used to find rows.)
To minimize disk I/O, the MyISAM storage engine employs a strategy that
is used by many database management systems. It exploits a cache
mechanism to keep the most frequently accessed table blocks in memory:
-
For index blocks, a special structure called the key cache (key buffer)
is maintained. The structure contains a number of block buffers where the
most-used index blocks are placed.
-
For data blocks, MySQL uses no special cache. Instead it relies on the
native operating system filesystem cache.
This section first describes the basic operation of the MyISAM key
cache. Then it discusses changes made in MySQL 4.1 that improve key cache
performance and that enable y |