| 
 |  | 
Table of Contents
MyISAM Storage EngineInnoDB Storage EngineInnoDB OverviewInnoDB Contact InformationInnoDB ConfigurationInnoDB Startup Options and System VariablesInnoDB TablespaceInnoDB TablesInnoDB Data and Log FilesInnoDB DatabaseInnoDB Database to Another MachineInnoDB Transaction Model and LockingInnoDB Performance Tuning TipsInnoDB Table and Index StructuresInnoDB File Space Management and Disk I/OInnoDB Error HandlingInnoDB TablesInnoDB TroubleshootingMERGE Storage EngineMEMORY (HEAP) Storage EngineBDB (BerkeleyDB) Storage EngineEXAMPLE Storage EngineFEDERATED Storage EngineARCHIVE Storage EngineCSV Storage EngineBLACKHOLE Storage EngineMySQL supports several storage engines that act as handlers for different table types. MySQL storage engines include both those that handle transaction-safe tables and those that handle non-transaction-safe tables:
        MyISAM manages non-transactional tables. It
        provides high-speed storage and retrieval, as well as fulltext
        searching capabilities. MyISAM is supported
        in all MySQL configurations, and is the default storage engine
        unless you have configured MySQL to use a different one by
        default.
      
        The MEMORY storage engine provides in-memory
        tables. The MERGE storage engine allows a
        collection of identical MyISAM tables to be
        handled as a single table. Like MyISAM, the
        MEMORY and MERGE storage
        engines handle non-transactional tables, and both are also
        included in MySQL by default.
      
        Note: The
        MEMORY storage engine formerly was known as
        the HEAP engine.
      
        The InnoDB and BDB storage
        engines provide transaction-safe tables. BDB
        is included in MySQL-Max binary distributions on those operating
        systems that support it. InnoDB is also
        included by default in all MySQL 5.0 binary
        distributions. In source distributions, you can enable or
        disable either engine by configuring MySQL as you like.
      
        The EXAMPLE storage engine is a
        “stub” engine that does nothing. You can create
        tables with this engine, but no data can be stored in them or
        retrieved from them. The purpose of this engine is to serve as
        an example in the MySQL source code that illustrates how to
        begin writing new storage engines. As such, it is primarily of
        interest to developers.
      
        NDB Cluster is the storage engine used by
        MySQL Cluster to implement tables that are partitioned over many
        computers. It is available in MySQL-Max 5.0 binary
        distributions. This storage engine is currently supported on
        Linux, Solaris, and Mac OS X only. We intend to add support for
        this engine on other platforms, including Windows, in future
        MySQL releases.
      
        The ARCHIVE storage engine is used for
        storing large amounts of data without indexes with a very small
        footprint.
      
        The CSV storage engine stores data in text
        files using comma-separated values format.
      
        The BLACKHOLE storage engine accepts but does
        not store data and retrievals always return an empty set.
      
        The FEDERATED storage engine was added in
        MySQL 5.0.3. This engine stores data in a remote database.
        Currently, it works with MySQL only, using the MySQL C Client
        API. In future releases, we intend to enable it to connect to
        other data sources using other drivers or client connection
        methods.
      
    This chapter describes each of the MySQL storage engines except for
    NDB Cluster, which is covered in
    Chapter 15, MySQL Cluster.
  
    When you create a new table, you can specify which storage engine to
    use by adding an ENGINE or
    TYPE table option to the CREATE
    TABLE statement:
  
CREATE TABLE t (i INT) ENGINE = INNODB; CREATE TABLE t (i INT) TYPE = MEMORY;
    The older term TYPE is supported as a synonym for
    ENGINE for backward compatibility, but
    ENGINE is the preferred term and
    TYPE is deprecated.
  
    If you omit the ENGINE or TYPE
    option, the default storage engine is used. Normally, this is
    MyISAM, but you can change it by using the
    --default-storage-engine or
    --default-table-type server startup option, or by
    setting the storage_engine or
    table_type system variable.
  
    When MySQL is installed on Windows using the MySQL Configuration
    Wizard, the InnoDB storage engine can be selected
    as the default instead of MyISAM. See
    Section 2.3.4.6, “The Database Usage Dialog”.
  
    To convert a table from one storage engine to another, use an
    ALTER TABLE statement that indicates the new
    engine:
  
ALTER TABLE t ENGINE = MYISAM; ALTER TABLE t TYPE = BDB;
    See Section 13.1.5, “CREATE TABLE Syntax”, and
    Section 13.1.2, “ALTER TABLE Syntax”.
  
    If you try to use a storage engine that is not compiled in or that
    is compiled in but deactivated, MySQL instead creates a table using
    the default storage engine, usually MyISAM. This
    behavior is convenient when you want to copy tables between MySQL
    servers that support different storage engines. (For example, in a
    replication setup, perhaps your master server supports transactional
    storage engines for increased safety, but the slave servers use only
    non-transactional storage engines for greater speed.)
  
This automatic substitution of the default storage engine for unavailable engines can be confusing for new MySQL users. A warning is generated whenever a storage engine is automatically changed.
    For new tables, MySQL always creates an .frm
    file to hold the table and column definitions. The table's index and
    data may be stored in one or more other files, depending on the
    storage engine. The server creates the .frm
    file above the storage engine level. Individual storage engines
    create any additional files required for the tables that they
    manage.
  
A database may contain tables of different types. That is, tables need not all be created with the same storage engine.
Transaction-safe tables (TSTs) have several advantages over non-transaction-safe tables (NTSTs):
They are safer. Even if MySQL crashes or you get hardware problems, you can get your data back, either by automatic recovery or from a backup plus the transaction log.
        You can combine many statements and accept them all at the same
        time with the COMMIT statement (if autocommit
        is disabled).
      
        You can execute ROLLBACK to ignore your
        changes (if autocommit is disabled).
      
If an update fails, all of your changes are reverted. (With non-transaction-safe tables, all changes that have taken place are permanent.)
Transaction-safe storage engines can provide better concurrency for tables that get many updates concurrently with reads.
    You can combine transaction-safe and non-transaction-safe tables in
    the same statements to get the best of both worlds. However,
    although MySQL supports several transaction-safe storage engines,
    for best results, you should not mix different storage engines
    within a transaction with autocommit disabled. For example, if you
    do this, changes to non-transaction-safe tables still are committed
    immediately and cannot be rolled back. For information about this
    and other problems that can occur in transactions that use mixed
    storage engines, see Section 13.4.1, “START TRANSACTION, COMMIT, and ROLLBACK Syntax”.
  
Non-transaction-safe tables have several advantages of their own, all of which occur because there is no transaction overhead:
Much faster
Lower disk space requirements
Less memory required to perform updates
      MyISAM is the default storage engine. It is
      based on the older ISAM code but has many
      useful extensions. (Note that MySQL 5.0 does
      not support ISAM.)
    
      Each MyISAM table is stored on disk in three
      files. The files have names that begin with the table name and
      have an extension to indicate the file type. An
      .frm file stores the table format. The data
      file has an .MYD (MYData)
      extension. The index file has an .MYI
      (MYIndex) extension.
    
      To specify explicitly that you want a MyISAM
      table, indicate that with an ENGINE table
      option:
    
CREATE TABLE t (i INT) ENGINE = MYISAM;
      The older term TYPE is supported as a synonym
      for ENGINE for backward compatibility, but
      ENGINE is the preferred term and
      TYPE is deprecated.
    
      Normally, it is unnecesary to use ENGINE to
      specify the MyISAM storage engine.
      MyISAM is the default engine unless the default
      has been changed. To ensure that MyISAM is used
      in situations where the default might have been changed, include
      the ENGINE option explicitly.
    
      You can check or repair MyISAM tables with the
      mysqlcheck client or
      myisamchk utility. You can also compress
      MyISAM tables with
      myisampack to take up much less space. See
      Section 8.9, “mysqlcheck — A Table Maintenance and Repair Program”, Section 5.10.4.1, “Using myisamchk for Crash Recovery”,
      and Section 8.4, “myisampack — Generate Compressed, Read-Only MyISAM Tables”.
    
      MyISAM tables have the following
      characteristics:
    
All data values are stored with the low byte first. This makes the data machine and operating system independent. The only requirements for binary portability are that the machine uses two's-complement signed integers and IEEE floating-point format. These requirements are widely used among mainstream machines. Binary compatibility might not be applicable to embedded systems, which sometimes have peculiar processors.
There is no significant speed penalty for storing data low byte first; the bytes in a table row normally are unaligned and it takes little more processing to read an unaligned byte in order than in reverse order. Also, the code in the server that fetches column values is not time critical compared to other code.
All numeric key values are stored with the high byte first to allow better index compression.
Large files (up to 63-bit file length) are supported on filesystems and operating systems that support large files.
Dynamic-sized rows are much less fragmented when mixing deletes with updates and inserts. This is done by automatically combining adjacent deleted blocks and by extending blocks if the next block is deleted.
          The maximum number of indexes per MyISAM
          table is 64. This can be changed by recompiling. Beginning
          with MySQL 5.0.18, you can configure the build by invoking
          configure with the
          --with-max-indexes=
          option, where NN is the maximum
          number of indexes to permit per MyISAM
          table. N must be less thann or
          equal to 128. Before MySQL 5.0.18, you must change the source.
        
The maximum number of columns per index is 16.
The maximum key length is 1000 bytes. This can also be changed by changing the source and recompiling. For the case of a key longer than 250 bytes, a larger key block size than the default of 1024 bytes is used.
          When rows are inserted in sorted order (as when you are using
          an AUTO_INCREMENT column), the index tree
          is split so that the high node only contains one key. This
          improves space utilization in the index tree.
        
          Internal handling of one AUTO_INCREMENT
          column per table is supported. MyISAM
          automatically updates this column for
          INSERT and UPDATE
          operations. This makes AUTO_INCREMENT
          columns faster (at least 10%). Values at the top of the
          sequence are not reused after being deleted. (When an
          AUTO_INCREMENT column is defined as the
          last column of a multiple-column index, reuse of values
          deleted from the top of a sequence does occur.) The
          AUTO_INCREMENT value can be reset with
          ALTER TABLE or
          myisamchk.
        
Dynamic-sized rows are much less fragmented when mixing deletes with updates and inserts. This is done by automatically combining adjacent deleted blocks and by extending blocks if the next block is deleted.
          If a table has no free blocks in the middle of the data file,
          you can INSERT new rows into it at the same
          time that other threads are reading from the table. (These are
          known as concurrent inserts.) A free block can occur as a
          result of deleting rows or an update of a dynamic length row
          with more data than its current contents. When all free blocks
          are used up (filled in), future inserts become concurrent
          again. See Section 7.3.3, “Concurrent Inserts”.
        
          You can put the data file and index file on different
          directories to get more speed with the DATA
          DIRECTORY and INDEX DIRECTORY
          table options to CREATE TABLE. See
          Section 13.1.5, “CREATE TABLE Syntax”.
        
          BLOB and TEXT columns
          can be indexed.
        
          NULL values are allowed in indexed columns.
          This takes 0–1 bytes per key.
        
Each character column can have a different character set. See Chapter 10, Character Set Support.
          There is a flag in the MyISAM index file
          that indicates whether the table was closed correctly. If
          mysqld is started with the
          --myisam-recover option,
          MyISAM tables are automatically checked
          when opened, and are repaired if the table wasn't closed
          properly.
        
          myisamchk marks tables as checked if you
          run it with the --update-state option.
          myisamchk --fast checks only those tables
          that don't have this mark.
        
myisamchk --analyze stores statistics for portions of keys, as well as for entire keys.
          myisampack can pack BLOB
          and VARCHAR columns.
        
      MyISAM also supports the following features:
    
          Support for a true VARCHAR type; a
          VARCHAR column starts with a length stored
          in one or two bytes.
        
          Tables with VARCHAR columns may have fixed
          or dynamic row length.
        
          The sum of the lengths of the VARCHAR and
          CHAR columns in a table may be up to 64KB.
        
          A hashed computed index can be used for
          UNIQUE. This allows you to have
          UNIQUE on any combination of columns in a
          table. (However, you cannot search on a
          UNIQUE computed index.)
        
Additional resources
          A forum dedicated to the MyISAM storage
          engine is available at
          http://forums.mysql.com/list.php?21.
        
        The following options to mysqld can be used
        to change the behavior of MyISAM tables. For
        additional information, see Section 5.2.1, “mysqld Command Options”.
      
            Set the mode for automatic recovery of crashed
            MyISAM tables.
          
            Don't flush key buffers between writes for any
            MyISAM table.
          
            Note: If you do this, you
            should not access MyISAM tables from
            another program (such as from another MySQL server or with
            myisamchk) when the tables are in use.
            Doing so risks index corruption. Using
            --external-locking does not eliminate this
            risk.
          
        The following system variables affect the behavior of
        MyISAM tables. For additional information,
        see Section 5.2.2, “Server System Variables”.
      
            bulk_insert_buffer_size
          
The size of the tree cache used in bulk insert optimization. Note: This is a limit per thread!
            myisam_max_extra_sort_file_size
          
Used to help MySQL to decide when to use the slow but safe key cache index creation method. Note: This parameter was given in bytes before MySQL 5.0.6, when it was removed.
            myisam_max_sort_file_size
          
Don't use the fast sort index method to create an index if the temporary file would become larger than this. Note: This parameter is given in bytes.
            myisam_sort_buffer_size
          
Set the size of the buffer used when recovering tables.
        Automatic recovery is activated if you start
        mysqld with the
        --myisam-recover option. In this case, when the
        server opens a MyISAM table, it checks
        whether the table is marked as crashed or whether the open count
        variable for the table is not 0 and you are running the server
        with external locking disabled. If either of these conditions is
        true, the following happens:
      
The server checks the table for errors.
If the server finds an error, it tries to do a fast table repair (with sorting and without re-creating the data file).
If the repair fails because of an error in the data file (for example, a duplicate-key error), the server tries again, this time re-creating the data file.
If the repair still fails, the server tries once more with the old repair option method (write row by row without sorting). This method should be able to repair any type of error and has low disk space requirements.
        If the recovery wouldn't be able to recover all rows from
        previously completed statementas and you didn't specify
        FORCE in the value of the
        --myisam-recover option, automatic repair
        aborts with an error message in the error log:
      
Error: Couldn't repair table: test.g00pages
        If you specify FORCE, a warning like this is
        written instead:
      
Warning: Found 344 of 354 rows when repairing ./test/g00pages
        Note that if the automatic recovery value includes
        BACKUP, the recovery process creates files
        with names of the form
        tbl_name-datetime.BAK
        MyISAM tables use B-tree indexes. You can
        roughly calculate the size for the index file as
        (key_length+4)/0.67, summed over all keys.
        This is for the worst case when all keys are inserted in sorted
        order and the table doesn't have any compressed keys.
      
        String indexes are space compressed. If the first index part is
        a string, it is also prefix compressed. Space compression makes
        the index file smaller than the worst-case figure if a string
        column has a lot of trailing space or is a
        VARCHAR column that is not always used to the
        full length. Prefix compression is used on keys that start with
        a string. Prefix compression helps if there are many strings
        with an identical prefix.
      
        In MyISAM tables, you can also prefix
        compress numbers by specifying the
        PACK_KEYS=1 table option when you create the
        table. Numbers are stored with the high byte first, so this
        helps when you have many integer keys that have an identical
        prefix.
      
        MyISAM supports three different storage
        formats. Two of them, fixed and dynamic format, are chosen
        automatically depending on the type of columns you are using.
        The third, compressed format, can be created only with the
        myisampack utility.
      
        When you use CREATE TABLE or ALTER
        TABLE for a table that has no BLOB
        or TEXT columns, you can force the table
        format to FIXED or DYNAMIC
        with the ROW_FORMAT table option. This causes
        CHAR and VARCHAR columns
        to become CHAR for FIXED
        format, or VARCHAR for
        DYNAMIC format.
      
        You can decompress tables by specifying
        ROW_FORMAT=DEFAULT with ALTER
        TABLE.
      
        See Section 13.1.5, “CREATE TABLE Syntax”, for information about
        ROW_FORMAT.
      
          Static format is the default for MyISAM
          tables. It is used when the table contains no variable-length
          columns (VARCHAR,
          VARBINARY, BLOB, or
          TEXT). Each row is stored using a fixed
          number of bytes.
        
          Of the three MyISAM storage formats, static
          format is the simplest and most secure (least subject to
          corruption). It is also the fastest of the on-disk formats due
          to the ease with which rows in the data file can be found on
          disk: To look up a row based on a row number in the index,
          multiply the row number by the row length to calculate the row
          position. Also, when scanning a table, it is very easy to read
          a constant number of rows with each disk read operation.
        
          The security is evidenced if your computer crashes while the
          MySQL server is writing to a fixed-format
          MyISAM file. In this case,
          myisamchk can easily determine where each
          row starts and ends, so it can usually reclaim all rows except
          the partially written one. Note that MyISAM
          table indexes can always be reconstructed based on the data
          rows.
        
Static-format tables have these characteristics:
              CHAR columns are space-padded to the
              column width. This is also true for
              NUMERIC and DECIMAL
              columns created before MySQL 5.0.3.
              BINARY columns are space-padded to the
              column width before MySQL 5.0.15. As of 5.0.15,
              BINARY columns are padded with
              0x00 bytes.
            
Very quick.
Easy to cache.
Easy to reconstruct after a crash, because rows are located in fixed positions.
              Reorganization is unnecessary unless you delete a huge
              number of rows and want to return free disk space to the
              operating system. To do this, use OPTIMIZE
              TABLE or myisamchk -r.
            
Usually require more disk space than dynamic-format tables.
          Dynamic storage format is used if a MyISAM
          table contains any variable-length columns
          (VARCHAR, VARBINARY,
          BLOB, or TEXT), or if
          the table was created with the
          ROW_FORMAT=DYNAMIC table option.
        
Dynamic format is a little more complex than static format because each row has a header that indicates how long it is. A row can become fragmented (stored in non-contiguous pieces) when it is made longer as a result of an update.
          You can use OPTIMIZE TABLE or
          myisamchk -r to defragment a table. If you
          have fixed-length columns that you access or change frequently
          in a table that also contains some variable-length columns, it
          might be a good idea to move the variable-length columns to
          other tables just to avoid fragmentation.
        
Dynamic-format tables have these characteristics:
All string columns are dynamic except those with a length less than four.
              Each row is preceded by a bitmap that indicates which
              columns contain the empty string (for string columns) or
              zero (for numeric columns). Note that this does not
              include columns that contain NULL
              values. If a string column has a length of zero after
              trailing space removal, or a numeric column has a value of
              zero, it is marked in the bitmap and not saved to disk.
              Non-empty strings are saved as a length byte plus the
              string contents.
            
Much less disk space usually is required than for fixed-length tables.
              Each row uses only as much space as is required. However,
              if a row becomes larger, it is split into as many pieces
              as are required, resulting in row fragmentation. For
              example, if you update a row with information that extends
              the row length, the row becomes fragmented. In this case,
              you may have to run OPTIMIZE TABLE or
              myisamchk -r from time to time to
              improve performance. Use myisamchk -ei
              to obtain table statistics.
            
More difficult than static-format tables to reconstruct after a crash, because rows may be fragmented into many pieces and links (fragments) may be missing.
The expected row length for dynamic-sized rows is calculated using the following expression:
3 + (number of columns+ 7) / 8 + (number of char columns) + (packed size of numeric columns) + (length of strings) + (number of NULL columns+ 7) / 8
              There is a penalty of 6 bytes for each link. A dynamic row
              is linked whenever an update causes an enlargement of the
              row. Each new link is at least 20 bytes, so the next
              enlargement probably goes in the same link. If not,
              another link is created. You can find the number of links
              using myisamchk -ed. All links may be
              removed with OPTIMIZE TABLE or
              myisamchk -r.
            
Compressed storage format is a read-only format that is generated with the myisampack tool. Compressed tables can be uncompressed with myisamchk.
Compressed tables have the following characteristics:
Compressed tables take very little disk space. This minimizes disk usage, which is helpful when using slow disks (such as CD-ROMs).
Each row is compressed separately, so there is very little access overhead. The header for a row takes up one to three bytes depending on the biggest row in the table. Each column is compressed differently. There is usually a different Huffman tree for each column. Some of the compression types are:
Suffix space compression.
Prefix space compression.
Numbers with a value of zero are stored using one bit.
                  If values in an integer column have a small range, the
                  column is stored using the smallest possible type. For
                  example, a BIGINT column (eight
                  bytes) can be stored as a TINYINT
                  column (one byte) if all its values are in the range
                  from -128 to
                  127.
                
                  If a column has only a small set of possible values,
                  the data type is converted to ENUM.
                
A column may use any combination of the preceding compression types.
Can be used for fixed-length or dynamic-length rows.
The file format that MySQL uses to store data has been extensively tested, but there are always circumstances that may cause database tables to become corrupted. The following discussion describes how this can happen and how to handle it.
          Even though the MyISAM table format is very
          reliable (all changes to a table made by an SQL statement are
          written before the statement returns), you can still get
          corrupted tables if any of the following events occur:
        
The mysqld process is killed in the middle of a write.
An unexpected computer shutdown occurs (for example, the computer is turned off).
Hardware failures.
You are using an external program (such as myisamchk) to modify a table that is being modified by the server at the same time.
              A software bug in the MySQL or MyISAM
              code.
            
Typical symptoms of a corrupt table are:
You get the following error while selecting data from the table:
Incorrect key file for table: '...'. Try to repair it
Queries don't find rows in the table or return incomplete results.
          You can check the health of a MyISAM table
          using the CHECK TABLE statement, and repair
          a corrupted MyISAM table with
          REPAIR TABLE. When
          mysqld is not running, you can also check
          or repair a table with the myisamchk
          command. See Section 13.5.2.3, “CHECK TABLE Syntax”,
          Section 13.5.2.6, “REPAIR TABLE Syntax”, and
          Section 8.2, “myisamchk — MyISAM Table-Maintenance Utility”.
        
          If your tables become corrupted frequently, you should try to
          determine why this is happening. The most important thing to
          know is whether the table became corrupted as a result of a
          server crash. You can verify this easily by looking for a
          recent restarted mysqld message in the
          error log. If there is such a message, it is likely that table
          corruption is a result of the server dying. Otherwise,
          corruption may have occurred during normal operation. This is
          a bug. You should try to create a reproducible test case that
          demonstrates the problem. See Section A.4.2, “What to Do If MySQL Keeps Crashing”, and
          Section E.1.6, “Making a Test Case If You Experience Table Corruption”.
        
          Each MyISAM index file
          (.MYI file) has a counter in the header
          that can be used to check whether a table has been closed
          properly. If you get the following warning from CHECK
          TABLE or myisamchk, it means that
          this counter has gone out of sync:
        
clients are using or haven't closed the table properly
This warning doesn't necessarily mean that the table is corrupted, but you should at least check the table.
The counter works as follows:
The first time a table is updated in MySQL, a counter in the header of the index files is incremented.
The counter is not changed during further updates.
              When the last instance of a table is closed (because a
              FLUSH TABLES operation was performed or
              because there is no room in the table cache), the counter
              is decremented if the table has been updated at any point.
            
When you repair the table or check the table and it is found to be okay, the counter is reset to zero.
To avoid problems with interaction with other processes that might check the table, the counter is not decremented on close if it was zero.
In other words, the counter can become incorrect only under these conditions:
              A MyISAM table is copied without first
              issuing LOCK TABLES and FLUSH
              TABLES.
            
MySQL has crashed between an update and the final close. (Note that the table may still be okay, because MySQL always issues writes for everything between each statement.)
A table was modified by myisamchk --recover or myisamchk --update-state at the same time that it was in use by mysqld.
              Multiple mysqld servers are using the
              table and one server performed a REPAIR
              TABLE or CHECK TABLE on the
              table while it was in use by another server. In this
              setup, it is safe to use CHECK TABLE,
              although you might get the warning from other servers.
              However, REPAIR TABLE should be avoided
              because when one server replaces the data file with a new
              one, this is not known to the other servers.
            
In general, it is a bad idea to share a data directory among multiple servers. See Section 5.13, “Running Multiple MySQL Servers on the Same Machine”, for additional discussion.
InnoDB OverviewInnoDB Contact InformationInnoDB ConfigurationInnoDB Startup Options and System VariablesInnoDB TablespaceInnoDB TablesInnoDB Data and Log FilesInnoDB DatabaseInnoDB Database to Another MachineInnoDB Transaction Model and LockingInnoDB Performance Tuning TipsInnoDB Table and Index StructuresInnoDB File Space Management and Disk I/OInnoDB Error HandlingInnoDB TablesInnoDB Troubleshooting
      InnoDB provides MySQL with a transaction-safe
      (ACID compliant) storage engine that has
      commit, rollback, and crash recovery capabilities.
      InnoDB does locking on the row level and also
      provides an Oracle-style consistent non-locking read in
      SELECT statements. These features increase
      multi-user concurrency and performance. There is no need for lock
      escalation in InnoDB because row-level locks
      fit in very little space. InnoDB also supports
      FOREIGN KEY constraints. You can freely mix
      InnoDB tables with tables from other MySQL
      storage engines, even within the same statement.
    
      InnoDB has been designed for maximum
      performance when processing large data volumes. Its CPU efficiency
      is probably not matched by any other disk-based relational
      database engine.
    
      Fully integrated with MySQL Server, the InnoDB
      storage engine maintains its own buffer pool for caching data and
      indexes in main memory. InnoDB stores its
      tables and indexes in a tablespace, which may consist of several
      files (or raw disk partitions). This is different from, for
      example, MyISAM tables where each table is
      stored using separate files. InnoDB tables can
      be of any size even on operating systems where file size is
      limited to 2GB.
    
      InnoDB is included in binary distributions by
      default. The Windows Essentials installer makes
      InnoDB the MySQL default storage engine on
      Windows.
    
      InnoDB is used in production at numerous large
      database sites requiring high performance. The famous Internet
      news site Slashdot.org runs on InnoDB. Mytrix,
      Inc. stores over 1TB of data in InnoDB, and
      another site handles an average load of 800 inserts/updates per
      second in InnoDB.
    
      InnoDB is published under the same GNU GPL
      License Version 2 (of June 1991) as MySQL. For more information on
      MySQL licensing, see
      http://www.mysql.com/company/legal/licensing/.
    
Additional resources
          A forum dedicated to the InnoDB storage
          engine is available at
          http://forums.mysql.com/list.php?22.
        
      Contact information for Innobase Oy, producer of the
      InnoDB engine:
    
Web site: http://www.innodb.com/
Email: <sales@innodb.com>
Phone: +358-9-6969 3250 (office)
       +358-40-5617367 (mobile)
Innobase Oy Inc.
World Trade Center Helsinki
Aleksanterinkatu 17
P.O.Box 800
00101 Helsinki
Finland
      The InnoDB storage engine is enabled by
      default. If you don't want to use InnoDB
      tables, you can add the skip-innodb option to
      your MySQL option file.
    
      Note: InnoDB
      provides MySQL with a transaction-safe (ACID
      compliant) storage engine that has commit, rollback, and crash
      recovery capabilities. However, it cannot do
      so if the underlying operating system or hardware does
      not work as advertised. Many operating systems or disk subsystems
      may delay or reorder write operations to improve performance. On
      some operating systems, the very system call that should wait
      until all unwritten data for a file has been flushed —
      fsync() — might actually return before
      the data has been flushed to stable storage. Because of this, an
      operating system crash or a power outage may destroy recently
      committed data, or in the worst case, even corrupt the database
      because of write operations having been reordered. If data
      integrity is important to you, you should perform some
      “pull-the-plug” tests before using anything in
      production. On Mac OS X 10.3 and up, InnoDB
      uses a special fcntl() file flush method. Under
      Linux, it is advisable to disable the
      write-back cache.
    
      On ATAPI hard disks, a command such hdparm -W0
      /dev/hda may work to disable the write-back cache.
      Beware that some drives or disk controllers
      may be unable to disable the write-back cache.
    
      Two important disk-based resources managed by the
      InnoDB storage engine are its tablespace data
      files and its log files.
    
      Note: If you specify no
      InnoDB configuration options, MySQL creates an
      auto-extending 10MB data file named ibdata1
      and two 5MB log files named ib_logfile0 and
      ib_logfile1 in the MySQL data directory. To
      get good performance, you should explicitly provide
      InnoDB parameters as discussed in the following
      examples. Naturally, you should edit the settings to suit your
      hardware and requirements.
    
      The examples shown here are representative. See
      Section 14.2.4, “InnoDB Startup Options and System Variables” for additional information
      about InnoDB-related configuration parameters.
    
      To set up the InnoDB tablespace files, use the
      innodb_data_file_path option in the
      [mysqld] section of the
      my.cnf option file. On Windows, you can use
      my.ini instead. The value of
      innodb_data_file_path should be a list of one
      or more data file specifications. If you name more than one data
      file, separate them by semicolon
      (‘;’) characters:
    
innodb_data_file_path=datafile_spec1[;datafile_spec2]...
For example, a setting that explicitly creates a tablespace having the same characteristics as the default is as follows:
[mysqld] innodb_data_file_path=ibdata1:10M:autoextend
      This setting configures a single 10MB data file named
      ibdata1 that is auto-extending. No location
      for the file is given, so by default, InnoDB
      creates it in the MySQL data directory.
    
      Sizes are specified using M or
      G suffix letters to indicate units of MB or GB.
    
      A tablespace containing a fixed-size 50MB data file named
      ibdata1 and a 50MB auto-extending file named
      ibdata2 in the data directory can be
      configured like this:
    
[mysqld] innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend
The full syntax for a data file specification includes the filename, its size, and several optional attributes:
file_name:file_size[:autoextend[:max:max_file_size]]
      The autoextend attribute and those following
      can be used only for the last data file in the
      innodb_data_file_path line.
    
      If you specify the autoextend option for the
      last data file, InnoDB extends the data file if
      it runs out of free space in the tablespace. The increment is 8MB
      at a time by default. It can be modified by changing the
      innodb_autoextend_increment system variable.
    
      If the disk becomes full, you might want to add another data file
      on another disk. Instructions for reconfiguring an existing
      tablespace are given in Section 14.2.7, “Adding and Removing InnoDB Data and Log Files”.
    
      InnoDB is not aware of the filesystem maximum
      file size, so be cautious on filesystems where the maximum file
      size is a small value such as 2GB. To specify a maximum size for
      an auto-extending data file, use the max
      attribute. The following configuration allows
      ibdata1 to grow up to a limit of 500MB:
    
[mysqld] innodb_data_file_path=ibdata1:10M:autoextend:max:500M
      InnoDB creates tablespace files in the MySQL
      data directory by default. To specify a location explicitly, use
      the innodb_data_home_dir option. For example,
      to use two files named ibdata1 and
      ibdata2 but create them in the
      /ibdata directory, configure
      InnoDB like this:
    
[mysqld] innodb_data_home_dir = /ibdata innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend
      Note: InnoDB
      does not create directories, so make sure that the
      /ibdata directory exists before you start the
      server. This is also true of any log file directories that you
      configure. Use the Unix or DOS mkdir command to
      create any necessary directories.
    
      InnoDB forms the directory path for each data
      file by textually concatenating the value of
      innodb_data_home_dir to the data file name,
      adding a pathname separator (slash or backslash) between values if
      necessary. If the innodb_data_home_dir option
      is not mentioned in my.cnf at all, the
      default value is the “dot” directory
      ./, which means the MySQL data directory.
      (The MySQL server changes its current working directory to its
      data directory when it begins executing.)
    
      If you specify innodb_data_home_dir as an empty
      string, you can specify absolute paths for the data files listed
      in the innodb_data_file_path value. The
      following example is equivalent to the preceding one:
    
[mysqld] innodb_data_home_dir = innodb_data_file_path=/ibdata/ibdata1:50M;/ibdata/ibdata2:50M:autoextend
      A simple my.cnf
      example. Suppose that you have a computer with 128MB
      RAM and one hard disk. The following example shows possible
      configuration parameters in my.cnf or
      my.ini for InnoDB,
      including the autoextend attribute. The example
      suits most users, both on Unix and Windows, who do not want to
      distribute InnoDB data files and log files onto
      several disks. It creates an auto-extending data file
      ibdata1 and two InnoDB log
      files ib_logfile0 and
      ib_logfile1 in the MySQL data directory.
      Also, the small archived InnoDB log file
      ib_arch_log_0000000000 that
      InnoDB creates automatically ends up in the
      data directory.
    
[mysqld] # You can write your other MySQL server options here # ... # Data files must be able to hold your data and indexes. # Make sure that you have enough free disk space. innodb_data_file_path = ibdata1:10M:autoextend # # Set buffer pool size to 50-80% of your computer's memory innodb_buffer_pool_size=70M innodb_additional_mem_pool_size=10M # # Set the log file size to about 25% of the buffer pool size innodb_log_file_size=20M innodb_log_buffer_size=8M # innodb_flush_log_at_trx_commit=1
Make sure that the MySQL server has the proper access rights to create files in the data directory. More generally, the server must have access rights in any directory where it needs to create data files or log files.
Note that data files must be less than 2GB in some filesystems. The combined size of the log files must be less than 4GB. The combined size of data files must be at least 10MB.
      When you create an InnoDB tablespace for the
      first time, it is best that you start the MySQL server from the
      command prompt. InnoDB then prints the
      information about the database creation to the screen, so you can
      see what is happening. For example, on Windows, if
      mysqld is located in C:\Program
      Files\MySQL\MySQL Server 5.0\bin, you can
      start it like this:
    
C:\> "C:\Program Files\MySQL\MySQL Server 5.0\bin\mysqld" --console
      If you do not send server output to the screen, check the server's
      error log to see what InnoDB prints during the
      startup process.
    
      See Section 14.2.5, “Creating the InnoDB Tablespace”, for an example of what the
      information displayed by InnoDB should look
      like.
    
      You can place InnoDB options in the
      [mysqld] group of any option file that your
      server reads when it starts. The locations for option files are
      described in Section 4.3.2, “Using Option Files”.
    
      If you installed MySQL on Windows using the installation and
      configuration wizards, the option file will be the
      my.ini file located in your MySQL
      installation directory. See
      Section 2.3.4.14, “The Location of the my.ini File”.
    
      If your PC uses a boot loader where the C:
      drive is not the boot drive, your only option is to use the
      my.ini file in your Windows directory
      (typically C:\WINDOWS or
      C:\WINNT). You can use the
      SET command at the command prompt in a console
      window to print the value of WINDIR:
    
C:\> SET WINDIR
windir=C:\WINDOWS
      If you want to make sure that mysqld reads
      options only from a specific file, you can use the
      --defaults-file option as the first option on the
      command line when starting the server:
    
mysqld --defaults-file=your_path_to_my_cnf
      An advanced my.cnf
      example. Suppose that you have a Linux computer with
      2GB RAM and three 60GB hard disks at directory paths
      /, /dr2 and
      /dr3. The following example shows possible
      configuration parameters in my.cnf for
      InnoDB.
    
[mysqld] # You can write your other MySQL server options here # ... innodb_data_home_dir = # # Data files must be able to hold your data and indexes innodb_data_file_path = /ibdata/ibdata1:2000M;/dr2/ibdata/ibdata2:2000M:autoextend # # Set buffer pool size to 50-80% of your computer's memory, # but make sure on Linux x86 total memory usage is < 2GB innodb_buffer_pool_size=1G innodb_additional_mem_pool_size=20M innodb_log_group_home_dir = /dr3/iblogs # innodb_log_files_in_group = 2 # # Set the log file size to about 25% of the buffer pool size innodb_log_file_size=250M innodb_log_buffer_size=8M # innodb_flush_log_at_trx_commit=1 innodb_lock_wait_timeout=50 # # Uncomment the next lines if you want to use them #innodb_thread_concurrency=5
      In some cases, database performance improves the if all data is
      not placed on the same physical disk. Putting log files on a
      different disk from data is very often beneficial for performance.
      The example illustrates how to do this. It places the two data
      files on different disks and places the log files on the third
      disk. InnoDB fills the tablespace beginning
      with the first data file. You can also use raw disk partitions
      (raw devices) as InnoDB data files, which may
      speed up I/O. See Section 14.2.3.2, “Using Raw Devices for the Shared Tablespace”.
    
      Warning: On 32-bit GNU/Linux x86,
      you must be careful not to set memory usage too high.
      glibc may allow the process heap to grow over
      thread stacks, which crashes your server. It is a risk if the
      value of the following expression is close to or exceeds 2GB:
    
innodb_buffer_pool_size + key_buffer_size + max_connections*(sort_buffer_size+read_buffer_size+binlog_cache_size) + max_connections*2MB
      Each thread uses a stack (often 2MB, but only 256KB in MySQL AB
      binaries) and in the worst case also uses
      sort_buffer_size + read_buffer_size additional
      memory.
    
      By compiling MySQL yourself, you can use up to 64GB of physical
      memory in 32-bit Windows. See the description for
      innodb_buffer_pool_awe_mem_mb in
      Section 14.2.4, “InnoDB Startup Options and System Variables”.
    
How to tune other mysqld server parameters? The following values are typical and suit most users:
[mysqld]
skip-external-locking
max_connections=200
read_buffer_size=1M
sort_buffer_size=1M
#
# Set key_buffer to 5 - 50% of your RAM depending on how much
# you use MyISAM tables, but keep key_buffer_size + InnoDB
# buffer pool size < 80% of your RAM
key_buffer_size=value
        You can store each InnoDB table and its
        indexes in its own file. This feature is called “multiple
        tablespaces” because in effect each table has its own
        tablespace.
      
        Using multiple tablespaces can be beneficial to users who want
        to move specific tables to separate physical disks or who wish
        to restore backups of single tables quickly without interrupting
        the use of the remaining InnoDB tables.
      
        You can enable multiple tablespaces by adding this line to the
        [mysqld] section of
        my.cnf:
      
[mysqld] innodb_file_per_table
        After restarting the server, InnoDB stores
        each newly created table into its own file
        tbl_name.ibdMyISAM storage engine does, but
        MyISAM divides the table into a data file
        tbl_name.MYDtbl_name.MYIInnoDB, the data and the indexes are
        stored together in the .ibd file. The
        tbl_name.frm
        If you remove the innodb_file_per_table line
        from my.cnf and restart the server,
        InnoDB creates tables inside the shared
        tablespace files again.
      
        innodb_file_per_table affects only table
        creation, not access to existing tables. If you start the server
        with this option, new tables are created using
        .ibd files, but you can still access tables
        that exist in the shared tablespace. If you remove the option
        and restart the server, new tables are created in the shared
        tablespace, but you can still access any tables that were
        created using multiple tablespaces.
      
        InnoDB always needs the shared tablespace
        because it puts its internal data dictionary and undo logs
        there. The .ibd files are not sufficient
        for InnoDB to operate.
      
        Note: You cannot freely move
        .ibd files between database directories as
        you can with MyISAM table files. This is
        because the table definition that is stored in the
        InnoDB shared tablespace includes the
        database name, and because InnoDB must
        preserve the consistency of transaction IDs and log sequence
        numbers.
      
        To move an .ibd file and the associated
        table from one database to another, use a RENAME
        TABLE statement:
      
RENAME TABLEdb1.tbl_nameTOdb2.tbl_name;
        If you have a “clean” backup of an
        .ibd file, you can restore it to the MySQL
        installation from which it originated as follows:
      
            Issue this ALTER TABLE statement:
          
ALTER TABLE tbl_name DISCARD TABLESPACE;
            Caution: This statement
            deletes the current .ibd file.
          
            Put the backup .ibd file back in the
            proper database directory.
          
            Issue this ALTER TABLE statement:
          
ALTER TABLE tbl_name IMPORT TABLESPACE;
        In this context, a “clean”
        .ibd file backup means:
      
            There are no uncommitted modifications by transactions in
            the .ibd file.
          
            There are no unmerged insert buffer entries in the
            .ibd file.
          
            Purge has removed all delete-marked index records from the
            .ibd file.
          
            mysqld has flushed all modified pages of
            the .ibd file from the buffer pool to
            the file.
          
        You can make a clean backup .ibd file using
        the following method:
      
Stop all activity from the mysqld server and commit all transactions.
            Wait until SHOW ENGINE INNODB STATUS
            shows that there are no active transactions in the database,
            and the main thread status of InnoDB is
            Waiting for server activity. Then you can
            make a copy of the .ibd file.
          
        Another method for making a clean copy of an
        .ibd file is to use the commercial
        InnoDB Hot Backup tool:
      
            Use InnoDB Hot Backup to back up the
            InnoDB installation.
          
            Start a second mysqld server on the
            backup and let it clean up the .ibd
            files in the backup.
          
You can use raw disk partitions as data files in the shared tablespace. By using a raw disk, you can perform non-buffered I/O on Windows and on some Unix systems without filesystem overhead, which may improve performance.
        When you create a new data file, you must put the keyword
        newraw immediately after the data file size
        in innodb_data_file_path. The partition must
        be at least as large as the size that you specify. Note that 1MB
        in InnoDB is 1024 × 1024 bytes, whereas
        1MB in disk specifications usually means 1,000,000 bytes.
      
[mysqld] innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:3Gnewraw;/dev/hdd2:2Gnewraw
        The next time you start the server, InnoDB
        notices the newraw keyword and initializes
        the new partition. However, do not create or change any
        InnoDB tables yet. Otherwise, when you next
        restart the server, InnoDB reinitializes the
        partition and your changes are lost. (As a safety measure
        InnoDB prevents users from modifying data
        when any partition with newraw is specified.)
      
        After InnoDB has initialized the new
        partition, stop the server, change newraw in
        the data file specification to raw:
      
[mysqld] innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:5Graw;/dev/hdd2:2Graw
        Then restart the server and InnoDB allows
        changes to be made.
      
On Windows, you can allocate a disk partition as a data file like this:
[mysqld] innodb_data_home_dir= innodb_data_file_path=//./D::10Gnewraw
        The //./ corresponds to the Windows syntax
        of \\.\ for accessing physical drives.
      
When you use raw disk partitions, be sure that they have permissions that allow read and write access by the account used for running the MySQL server.
      This section describes the InnoDB-related
      command options and system variables. System variables that are
      true or false can be enabled at server startup by naming them, or
      disabled by using a skip- prefix. For example,
      to enable or disable InnoDB checksums, you can
      use --innodb_checksums or
      --skip-innodb_checksums on the command line, or
      innodb_checksums or
      skip-innodb_checksums in an option file. System
      variables that take a numeric value can be specified as
      --
      on the command line or as
      var_name=valuevar_name=value
      InnoDB command options:
    
          Enables the InnoDB storage engine, if the
          server was compiled with InnoDB support.
          Use --skip-innodb to disable
          InnoDB.
        
          Causes InnoDB to create a file named
          <datadir>/innodb_status.<pid>InnoDB
          periodically writes the output of SHOW ENGINE INNODB
          STATUS to this file.
        
      InnoDB system variables:
    
          innodb_additional_mem_pool_size
        
          The size in bytes of a memory pool InnoDB
          uses to store data dictionary information and other internal
          data structures. The more tables you have in your application,
          the more memory you need to allocate here. If
          InnoDB runs out of memory in this pool, it
          starts to allocate memory from the operating system and writes
          warning messages to the MySQL error log. The default value is
          1MB.
        
          innodb_autoextend_increment
        
The increment size (in MB) for extending the size of an auto-extending tablespace when it becomes full. The default value is 8.
          innodb_buffer_pool_awe_mem_mb
        
          The size of the buffer pool (in MB), if it is placed in the
          AWE memory. This is relevant only in 32-bit Windows. If your
          32-bit Windows operating system supports more than 4GB memory,
          using so-called “Address Windowing Extensions,”
          you can allocate the InnoDB buffer pool
          into the AWE physical memory using this variable. The maximum
          possible value for this variable is 63000. If it is greater
          than 0, innodb_buffer_pool_size is the
          window in the 32-bit address space of
          mysqld where InnoDB maps
          that AWE memory. A good value for
          innodb_buffer_pool_size is 500MB.
        
          To take advantage of AWE memory, you will need to recompile
          MySQL yourself. The current project settings needed for doing
          this can be found in the
          innobase/os/os0proj.c source file.
        
          innodb_buffer_pool_size
        
          The size in bytes of the memory buffer
          InnoDB uses to cache data and indexes of
          its tables. The larger you set this value, the less disk I/O
          is needed to access data in tables. On a dedicated database
          server, you may set this to up to 80% of the machine physical
          memory size. However, do not set it too large because
          competition for physical memory might cause paging in the
          operating system.
        
          innodb_checksums
        
          InnoDB can use checksum validation on all
          pages read from the disk to ensure extra fault tolerance
          against broken hardware or data files. This validation is
          enabled by default. However, under some rare circumstances
          (such as when running benchmarks) this extra safety feature is
          unneeded and can be disabled with
          --skip-innodb_checksums. This variable was
          added in MySQL 5.0.3.
        
          innodb_commit_concurrency
        
The number of threads that can commit at the same time. A value of 0 disables concurrency control. This variable was added in MySQL 5.0.12.
          innodb_concurrency_tickets
        
          The number of threads that can enter InnoDB
          concurrently is determined by the
          innodb_thread_concurrency variable. A
          thread is placed in a queue when it tries to enter
          InnoDB if the number of threads has already
          reached the concurrency limit. When a thread is allowed to
          enter InnoDB, it is given a number of
          “free tickets” equal to the value of
          innodb_concurrency_tickets, and the thread
          can enter and leave InnoDB freely until it
          has used up its tickets. After that point, the thread again
          becomes subject to the concurrency check (and possible
          queuing) the next time it tries to enter
          InnoDB. This variable was added in MySQL
          5.0.3.
        
          innodb_data_file_path
        
          The paths to individual data files and their sizes. The full
          directory path to each data file is formed by concatenating
          innodb_data_home_dir to each path specified
          here. The file sizes are specified in MB or GB (1024MB) by
          appending M or G to the
          size value. The sum of the sizes of the files must be at least
          10MB. If you do not specify
          innodb_data_file_path, the default behavior
          is to create a single 10MB auto-extending data file named
          ibdata1. The size limit of individual
          files is determined by your operating system. You can set the
          file size to more than 4GB on those operating systems that
          support big files. You can also use raw disk partitions as
          data files. See Section 14.2.3.2, “Using Raw Devices for the Shared Tablespace”.
        
          innodb_data_home_dir
        
          The common part of the directory path for all
          InnoDB data files. If you do not set this
          value, the default is the MySQL data directory. You can
          specify the value as an empty string, in which case you can
          use absolute file paths in
          innodb_data_file_path.
        
          innodb_doublewrite
        
          By default, InnoDB stores all data twice,
          first to the doublewrite buffer, and then to the actual data
          files. This variable is enabled by default. It can be turned
          off with --skip-innodb_doublewrite for
          benchmarks or cases when top performance is needed rather than
          concern for data integrity or possible failures. This variable
          was added in MySQL 5.0.3.
        
          innodb_fast_shutdown
        
          If you set this variable to 0, InnoDB does
          a full purge and an insert buffer merge before a shutdown.
          These operations can take minutes, or even hours in extreme
          cases. If you set this variable to 1,
          InnoDB skips these operations at shutdown.
          The default value is 1. If you set it to 2,
          InnoDB will just flush its logs and then
          shut down cold, as if MySQL had crashed; no committed
          transaction will be lost, but crash recovery will be done at
          the next startup. The value of 2 can be used as of MySQL
          5.0.5, except that it cannot be used on NetWare.
        
          innodb_file_io_threads
        
          The number of file I/O threads in InnoDB.
          Normally, this should be left at the default value of 4, but
          disk I/O on Windows may benefit from a larger number. On Unix,
          increasing the number has no effect; InnoDB
          always uses the default value.
        
          innodb_file_per_table
        
          If this variable is enabled, InnoDB creates
          each new table using its own .ibd file
          for storing data and indexes, rather than in the shared
          tablespace. The default is to create tables in the shared
          tablespace. See Section 14.2.3.1, “Using Per-Table Tablespaces”.
        
          innodb_flush_log_at_trx_commit
        
          When innodb_flush_log_at_trx_commit is set
          to 0, the log buffer is written out to the log file once per
          second and the flush to disk operation is performed on the log
          file, but nothing is done at a transaction commit. When this
          value is 1 (the default), the log buffer is written out to the
          log file at each transaction commit and the flush to disk
          operation is performed on the log file. When set to 2, the log
          buffer is written out to the file at each commit, but the
          flush to disk operation is not performed on it. However, the
          flushing on the log file takes place once per second also when
          the value is 2. Note that the once-per-second flushing is not
          100% guaranteed to happen every second, due to process
          scheduling issues.
        
          The default value of this variable is 1, which is the value
          that is required for ACID compliance. You can achieve better
          performance by setting the value different from 1, but then
          you can lose at most one second worth of transactions in a
          crash. If you set the value to 0, then any
          mysqld process crash can erase the last
          second of transactions. If you set the value to 2, then only
          an operating system crash or a power outage can erase the last
          second of transactions. However, InnoDB's
          crash recovery is not affected and thus crash recovery does
          work regardless of the value. Note that many operating systems
          and some disk hardware fool the flush-to-disk operation. They
          may tell mysqld that the flush has taken
          place, even though it has not. Then the durability of
          transactions is not guaranteed even with the setting 1, and in
          the worst case a power outage can even corrupt the
          InnoDB database. Using a battery-backed
          disk cache in the SCSI disk controller or in the disk itself
          speeds up file flushes, and makes the operation safer. You can
          also try using the Unix command hdparm to
          disable the caching of disk writes in hardware caches, or use
          some other command specific to the hardware vendor.
        
          innodb_flush_method
        
          If set to fdatasync (the default),
          InnoDB uses fsync() to
          flush both the data and log files. If set to
          O_DSYNC, InnoDB uses
          O_SYNC to open and flush the log files, but
          uses fsync() to flush the data files. If
          O_DIRECT is specified (available on some
          GNU/Linux versions), InnoDB uses
          O_DIRECT to open the data files, and uses
          fsync() to flush both the data and log
          files. Note that InnoDB uses
          fsync() instead of
          fdatasync(), and it does not use
          O_DSYNC by default because there have been
          problems with it on many varieties of Unix. This variable is
          relevant only for Unix. On Windows, the flush method is always
          async_unbuffered and cannot be changed.
        
          innodb_force_recovery
        
          The crash recovery mode. Warning: This variable should be set
          greater than 0 only in an emergency situation when you want to
          dump your tables from a corrupt database! Possible values are
          from 1 to 6. The meanings of these values are described in
          Section 14.2.8.1, “Forcing InnoDB Recovery”. As a safety measure,
          InnoDB prevents any changes to its data
          when this variable is greater than 0.
        
          innodb_lock_wait_timeout
        
          The timeout in seconds an InnoDB
          transaction may wait for a lock before being rolled back.
          InnoDB automatically detects transaction
          deadlocks in its own lock table and rolls back the
          transaction. InnoDB notices locks set using
          the LOCK TABLES statement. The default is
          50 seconds.
        
          Note: For the greatest possible durability and consistency in
          a replication setup using InnoDB with
          transactions, you should use
          innodb_flush_log_at_trx_commit=1,
          sync_binlog=1, and, before MySQL 5.0.3,
          innodb_safe_binlog in your master server
          my.cnf file.
          (innodb_safe_binlog is not needed from
          5.0.3 on.)
        
          innodb_locks_unsafe_for_binlog
        
          This variable controls next-key locking in
          InnoDB searches and index scans. By
          default, this variable is 0 (disabled), which means that
          next-key locking is enabled.
        
          Normally, InnoDB uses an algorithm called
          next-key locking.
          InnoDB performs row-level locking in such a
          way that when it searches or scans a table index, it sets
          shared or exclusive locks on any index records it encounters.
          Thus, the row-level locks are actually index record locks. The
          locks that InnoDB sets on index records
          also affect the “gap” preceding that index
          record. If a user has a shared or exclusive lock on record
          R in an index, another user cannot insert
          a new index record immediately before R
          in the order of the index. Enabling this variable causes
          InnoDB not to use next-key locking in
          searches or index scans. Next-key locking is still used to
          ensure foreign key constraints and duplicate key checking.
          Note that enabling this variable may cause phantom problems:
          Suppose that you want to read and lock all children from the
          child table with an identifier value larger
          than 100, with the intention of updating some column in the
          selected rows later:
        
SELECT * FROM child WHERE id > 100 FOR UPDATE;
          Suppose that there is an index on the id
          column. The query scans that index starting from the first
          record where id is greater than 100. If the
          locks set on the index records do not lock out inserts made in
          the gaps, another client can insert a new row into the table.
          If you execute the same SELECT within the
          same transaction, you see a new row in the result set returned
          by the query. This also means that if new items are added to
          the database, InnoDB does not guarantee
          serializability. Therefore, if this variable is enabled
          InnoDB guarantees at most isolation level
          READ COMMITTED. (Conflict serializability
          is still guaranteed.)
        
          Starting from MySQL 5.0.2, this option is even more unsafe.
          InnoDB in an UPDATE or a
          DELETE only locks rows that it updates or
          deletes. This greatly reduces the probability of deadlocks,
          but they can happen. Note that enabling this variable still
          does not allow operations such as UPDATE to
          overtake other similar operations (such as another
          UPDATE) even in the case when they affect
          different rows. Consider the following example, beginning with
          this table:
        
CREATE TABLE A(A INT NOT NULL, B INT) ENGINE = InnoDB; INSERT INTO A VALUES (1,2),(2,3),(3,2),(4,3),(5,2); COMMIT;
Suppose that one client executes these statements:
SET AUTOCOMMIT = 0; UPDATE A SET B = 5 WHERE B = 3;
Then suppose that another client executes these statements following those of the first client:
SET AUTOCOMMIT = 0; UPDATE A SET B = 4 WHERE B = 2;
          In this case, the second UPDATE must wait
          for a commit or rollback of the first
          UPDATE. The first UPDATE
          has an exclusive lock on row (2,3), and the second
          UPDATE while scanning rows also tries to
          acquire an exclusive lock for the same row, which it cannot
          have. This is because UPDATE two first
          acquires an exclusive lock on a row and then determines
          whether the row belongs to the result set. If not, it releases
          the unnecessary lock, when the
          innodb_locks_unsafe_for_binlog variable is
          enabled.
        
          Therefore, InnoDB executes
          UPDATE one as follows:
        
x-lock(1,2) unlock(1,2) x-lock(2,3) update(2,3) to (2,5) x-lock(3,2) unlock(3,2) x-lock(4,3) update(4,3) to (4,5) x-lock(5,2) unlock(5,2)
          InnoDB executes UPDATE
          two as follows:
        
x-lock(1,2) update(1,2) to (1,4) x-lock(2,3) - wait for query one to commit or rollback
          innodb_log_arch_dir
        
          The directory where fully written log files would be archived
          if we used log archiving. If used, the value of this variable
          should be set the same as
          innodb_log_group_home_dir. However, it is
          not required.
        
          innodb_log_archive
        
          Whether to log InnoDB archive files. This
          variable is present for historical reasons, but is unused.
          Recovery from a backup is done by MySQL using its own log
          files, so there is no need to archive
          InnoDB log files. The default for this
          variable is 0.
        
          innodb_log_buffer_size
        
          The size in bytes of the buffer that InnoDB
          uses to write to the log files on disk. Sensible values range
          from 1MB to 8MB. The default is 1MB. A large log buffer allows
          large transactions to run without a need to write the log to
          disk before the transactions commit. Thus, if you have big
          transactions, making the log buffer larger saves disk I/O.
        
          innodb_log_file_size
        
          The size in bytes of each log file in a log group. The
          combined size of log files must be less than 4GB on 32-bit
          computers. The default is 5MB. Sensible values range from 1MB
          to 1/N-th of the size of the buffer
          pool, where N is the number of log
          files in the group. The larger the value, the less checkpoint
          flush activity is needed in the buffer pool, saving disk I/O.
          But larger log files also mean that recovery is slower in case
          of a crash.
        
          innodb_log_files_in_group
        
          The number of log files in the log group.
          InnoDB writes to the files in a circular
          fashion. The default (and recommended) is 2.
        
          innodb_log_group_home_dir
        
          The directory path to the InnoDB log files.
          It must have the same value as
          innodb_log_arch_dir. If you do not specify
          any InnoDB log variables, the default is to
          create two 5MB files names ib_logfile0
          and ib_logfile1 in the MySQL data
          directory.
        
          innodb_max_dirty_pages_pct
        
          This is an integer in the range from 0 to 100. The default is
          90. The main thread in InnoDB tries to
          write pages from the buffer pool so that the percentage of
          dirty (not yet written) pages will not exceed this value.
        
          innodb_max_purge_lag
        
          This variable controls how to delay INSERT,
          UPDATE and DELETE
          operations when the purge operations are lagging (see
          Section 14.2.12, “Implementation of Multi-Versioning”). The default value
          of this variable is 0, meaning that there are no delays.
        
          The InnoDB transaction system maintains a
          list of transactions that have delete-marked index records by
          UPDATE or DELETE
          operations. Let the length of this list be
          purge_lag. When
          purge_lag exceeds
          innodb_max_purge_lag, each
          INSERT, UPDATE and
          DELETE operation is delayed by
          ((purge_lag/innodb_max_purge_lag)×10)–5
          milliseconds. The delay is computed in the beginning of a
          purge batch, every ten seconds. The operations are not delayed
          if purge cannot run because of an old consistent read view
          that could see the rows to be purged.
        
A typical setting for a problematic workload might be 1 million, assuming that our transactions are small, only 100 bytes in size, and we can allow 100MB of unpurged rows in our tables.
          innodb_mirrored_log_groups
        
The number of identical copies of log groups to keep for the database. Currently, this should be set to 1.
          innodb_open_files
        
          This variable is relevant only if you use multiple tablespaces
          in InnoDB. It specifies the maximum number
          of .ibd files that
          InnoDB can keep open at one time. The
          minimum value is 10. The default is 300.
        
          The file descriptors used for .ibd files
          are for InnoDB only. They are independent
          of those specified by the --open-files-limit
          server option, and do not affect the operation of the table
          cache.
        
          innodb_safe_binlog
        
          Adds consistency guarantees between the content of
          InnoDB tables and the binary log. See
          Section 5.12.3, “The Binary Log”. This variable was removed in
          MySQL 5.0.3, having been made obsolete by the introduction of
          XA transaction support.
        
          innodb_support_xa
        
          When set to ON or 1 (the default), this
          variable enables InnoDB support for
          two-phase commit in XA transactions. Enabling
          innodb_support_xa causes an extra disk
          flush for transaction preparation. If you don't care about
          using XA, you can disable this variable by setting it to
          OFF or 0 to reduce the number of disk
          flushes and get better InnoDB performance.
          This variable was added in MySQL 5.0.3.
        
          innodb_sync_spin_loops
        
          The number of times a thread waits for an
          InnoDB mutex to be freed before the thread
          is suspended. This variable was added in MySQL 5.0.3.
        
          innodb_table_locks
        
          InnoDB honors LOCK
          TABLES; MySQL does not return from LOCK
          TABLE .. WRITE until all other threads have released
          all their locks to the table. The default value is 1, which
          means that LOCK TABLES causes
          InnoDB to lock a table internally. In
          applications using AUTOCOMMIT=1,
          InnoDB's internal table locks can cause
          deadlocks. You can set innodb_table_locks=0
          in the server option file to remove that problem.
        
          innodb_thread_concurrency
        
          InnoDB tries to keep the number of
          operating system threads concurrently inside
          InnoDB less than or equal to the limit
          given by this variable. Before MySQL 5.0.8, the default value
          is 8. If you have performance issues, and SHOW ENGINE
          INNODB STATUS reveals many threads waiting for
          semaphores, you may have thread “thrashing” and
          should try setting this variable lower or higher. If you have
          a computer with many processors and disks, you can try setting
          the value higher to make better use of your computer's
          resources. A recommended value is the sum of the number of
          processors and disks your system has. A value of 500 or
          greater disables concurrency checking. Starting with MySQL
          5.0.8, the default value is 20, and concurrency checking will
          be disabled if the setting is greater than or equal to 20.
        
          innodb_thread_sleep_delay
        
          How long InnoDB threads sleep before
          joining the InnoDB queue, in microseconds.
          The default value is 10,000. A value of 0 disables sleep. This
          variable was added in MySQL 5.0.3.
        
          sync_binlog
        
          If the value of this variable is positive, the MySQL server
          synchronizes its binary log to disk
          (fdatasync()) after every
          sync_binlog writes to this binary log. Note
          that there is one write to the binary log per statement if in
          autocommit mode, and otherwise one write per transaction. The
          default value is 0 which does no synchronizing to disk. A
          value of 1 is the safest choice, because in the event of a
          crash you lose at most one statement/transaction from the
          binary log; however, it is also the slowest choice (unless the
          disk has a battery-backed cache, which makes synchronization
          very fast).
        
      Suppose that you have installed MySQL and have edited your option
      file so that it contains the necessary InnoDB
      configuration parameters. Before starting MySQL, you should verify
      that the directories you have specified for
      InnoDB data files and log files exist and that
      the MySQL server has access rights to those directories.
      InnoDB does not create directories, only files.
      Check also that you have enough disk space for the data and log
      files.
    
      It is best to run the MySQL server mysqld from
      the command prompt when you first start the server with
      InnoDB enabled, not from the
      mysqld_safe wrapper or as a Windows service.
      When you run from a command prompt you see what
      mysqld prints and what is happening. On Unix,
      just invoke mysqld. On Windows, use the
      --console option.
    
      When you start the MySQL server after initially configuring
      InnoDB in your option file,
      InnoDB creates your data files and log files,
      and prints something like this:
    
InnoDB: The first specified datafile /home/heikki/data/ibdata1 did not exist: InnoDB: a new database to be created! InnoDB: Setting file /home/heikki/data/ibdata1 size to 134217728 InnoDB: Database physically writes the file full: wait... InnoDB: datafile /home/heikki/data/ibdata2 did not exist: new to be created InnoDB: Setting file /home/heikki/data/ibdata2 size to 262144000 InnoDB: Database physically writes the file full: wait... InnoDB: Log file /home/heikki/data/logs/ib_logfile0 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile0 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile1 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile1 size to 5242880 InnoDB: Doublewrite buffer not found: creating new InnoDB: Doublewrite buffer created InnoDB: Creating foreign key constraint system tables InnoDB: Foreign key constraint system tables created InnoDB: Started mysqld: ready for connections
      At this point InnoDB has initialized its
      tablespace and log files. You can connect to the MySQL server with
      the usual MySQL client programs like mysql.
      When you shut down the MySQL server with mysqladmin
      shutdown, the output is like this:
    
010321 18:33:34 mysqld: Normal shutdown 010321 18:33:34 mysqld: Shutdown Complete InnoDB: Starting shutdown... InnoDB: Shutdown completed
      You can look at the data file and log directories and you see the
      files created there. The log directory also contains a small file
      named ib_arch_log_0000000000. That file
      resulted from the database creation, after which
      InnoDB switched off log archiving. When MySQL
      is started again, the data files and log files have been created
      already, so the output is much briefer:
    
InnoDB: Started mysqld: ready for connections
      If you add the innodb_file_per_table option to
      my.cnf, InnoDB stores each
      table in its own .ibd file in the same MySQL
      database directory where the .frm file is
      created. See Section 14.2.3.1, “Using Per-Table Tablespaces”.
    
        If InnoDB prints an operating system error
        during a file operation, usually the problem has one of the
        following causes:
      
            You did not create the InnoDB data file
            directory or the InnoDB log directory.
          
mysqld does not have access rights to create files in those directories.
            mysqld cannot read the proper
            my.cnf or my.ini
            option file, and consequently does not see the options that
            you specified.
          
The disk is full or a disk quota is exceeded.
You have created a subdirectory whose name is equal to a data file that you specified, so the name cannot be used as a filename.
            There is a syntax error in the
            innodb_data_home_dir or
            innodb_data_file_path value.
          
        If something goes wrong when InnoDB attempts
        to initialize its tablespace or its log files, you should delete
        all files created by InnoDB. This means all
        ibdata files and all
        ib_logfile files. In case you have already
        created some InnoDB tables, delete the
        corresponding .frm files for these tables
        (and any .ibd files if you are using
        multiple tablespaces) from the MySQL database directories as
        well. Then you can try the InnoDB database
        creation again. It is best to start the MySQL server from a
        command prompt so that you see what is happening.
      
      To create an InnoDB table, specify an
      ENGINE = InnoDB option in the CREATE
      TABLE statement:
    
CREATE TABLE customers (a INT, b CHAR (20), INDEX (a)) ENGINE=InnoDB;
      The older term TYPE is supported as a synonym
      for ENGINE for backward compatibility, but
      ENGINE is the preferred term and
      TYPE is deprecated.
    
      The statement creates a table and an index on column
      a in the InnoDB tablespace
      that consists of the data files that you specified in
      my.cnf. In addition, MySQL creates a file
      customers.frm in the
      test directory under the MySQL database
      directory. Internally, InnoDB adds an entry for
      the table to its own data dictionary. The entry includes the
      database name. For example, if test is the
      database in which the customers table is
      created, the entry is for 'test/customers'.
      This means you can create a table of the same name
      customers in some other database, and the table
      names do not collide inside InnoDB.
    
      You can query the amount of free space in the
      InnoDB tablespace by issuing a SHOW
      TABLE STATUS statement for any InnoDB
      table. The amount of free space in the tablespace appears in the
      Comment section in the output of SHOW
      TABLE STATUS. For example:
    
SHOW TABLE STATUS FROM test LIKE 'customers'
      Note that the statistics SHOW displays for
      InnoDB tables are only approximate. They are
      used in SQL optimization. Table and index reserved sizes in bytes
      are accurate, though.
    
        By default, each client that connects to the MySQL server begins
        with autocommit mode enabled, which automatically commits every
        SQL statement as you execute it. To use multiple-statement
        transactions, you can switch autocommit off with the SQL
        statement SET AUTOCOMMIT = 0 and use
        COMMIT and ROLLBACK to
        commit or roll back your transaction. If you want to leave
        autocommit on, you can enclose your transactions within
        START TRANSACTION and either
        COMMIT or ROLLBACK. The
        following example shows two transactions. The first is
        committed; the second is rolled back.
      
shell>mysql testmysql>CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A))->ENGINE=InnoDB;Query OK, 0 rows affected (0.00 sec) mysql>START TRANSACTION;Query OK, 0 rows affected (0.00 sec) mysql>INSERT INTO CUSTOMER VALUES (10, 'Heikki');Query OK, 1 row affected (0.00 sec) mysql>COMMIT;Query OK, 0 rows affected (0.00 sec) mysql>SET AUTOCOMMIT=0;Query OK, 0 rows affected (0.00 sec) mysql>INSERT INTO CUSTOMER VALUES (15, 'John');Query OK, 1 row affected (0.00 sec) mysql>ROLLBACK;Query OK, 0 rows affected (0.00 sec) mysql>SELECT * FROM CUSTOMER;+------+--------+ | A | B | +------+--------+ | 10 | Heikki | +------+--------+ 1 row in set (0.00 sec) mysql>
        In APIs such as PHP, Perl DBI, JDBC, ODBC, or the standard C
        call interface of MySQL, you can send transaction control
        statements such as COMMIT to the MySQL server
        as strings just like any other SQL statements such as
        SELECT or INSERT. Some
        APIs also offer separate special transaction commit and rollback
        functions or methods.
      
        Important: Do not convert MySQL system tables in the
        mysql database (such as
        user or host) to the
        InnoDB type. This is an unsupported
        operation. The system tables must always be of the
        MyISAM type.
      
        If you want all your (non-system) tables to be created as
        InnoDB tables, you can simply add the line
        default-storage-engine=innodb to the
        [mysqld] section of your server option file.
      
        InnoDB does not have a special optimization
        for separate index creation the way the
        MyISAM storage engine does. Therefore, it
        does not pay to export and import the table and create indexes
        afterward. The fastest way to alter a table to
        InnoDB is to do the inserts directly to an
        InnoDB table. That is, use ALTER
        TABLE ... ENGINE=INNODB, or create an empty
        InnoDB table with identical definitions and
        insert the rows with INSERT INTO ... SELECT * FROM
        ....
      
        If you have UNIQUE constraints on secondary
        keys, you can speed up a table import by turning off the
        uniqueness checks temporarily during the import operation:
      
SET UNIQUE_CHECKS=0;
... import operation ...
SET UNIQUE_CHECKS=1;
        For big tables, this saves a lot of disk I/O because
        InnoDB can then use its insert buffer to
        write secondary index records as a batch.
      
To get better control over the insertion process, it might be good to insert big tables in pieces:
INSERT INTO newtable SELECT * FROM oldtable WHERE yourkey > something AND yourkey <= somethingelse;
After all records have been inserted, you can rename the tables.
        During the conversion of big tables, you should increase the
        size of the InnoDB buffer pool to reduce disk
        I/O. Do not use more than 80% of the physical memory, though.
        You can also increase the sizes of the InnoDB
        log files.
      
        Make sure that you do not fill up the tablespace:
        InnoDB tables require a lot more disk space
        than MyISAM tables. If an ALTER
        TABLE operation runs out of space, it starts a
        rollback, and that can take hours if it is disk-bound. For
        inserts, InnoDB uses the insert buffer to
        merge secondary index records to indexes in batches. That saves
        a lot of disk I/O. For rollback, no such mechanism is used, and
        the rollback can take 30 times longer than the insertion.
      
        In the case of a runaway rollback, if you do not have valuable
        data in your database, it may be advisable to kill the database
        process rather than wait for millions of disk I/O operations to
        complete. For the complete procedure, see
        Section 14.2.8.1, “Forcing InnoDB Recovery”.
      
        If you specify an AUTO_INCREMENT column for
        an InnoDB table, the table handle in the
        InnoDB data dictionary contains a special
        counter called the auto-increment counter that is used in
        assigning new values for the column. This counter is stored only
        in main memory, not on disk.
      
        InnoDB uses the following algorithm to
        initialize the auto-increment counter for a table
        T that contains an
        AUTO_INCREMENT column named
        ai_col: After a server startup, for the first
        insert into a table T,
        InnoDB executes the equivalent of this
        statement:
      
SELECT MAX(ai_col) FROM T FOR UPDATE;
        InnoDB increments by one the value retrieved
        by the statement and assigns it to the column and to the
        auto-increment counter for the table. If the table is empty,
        InnoDB uses the value 1.
        If a user invokes a SHOW TABLE STATUS
        statement that displays output for the table
        T and the auto-increment counter has not been
        initialized, InnoDB initializes but does not
        increment the value and stores it for use by later inserts. Note
        that this initialization uses a normal exclusive-locking read on
        the table and the lock lasts to the end of the transaction.
      
        InnoDB follows the same procedure for
        initializing the auto-increment counter for a freshly created
        table.
      
        After the auto-increment counter has been initialized, if a user
        does not explicitly specify a value for an
        AUTO_INCREMENT column,
        InnoDB increments the counter by one and
        assigns the new value to the column. If the user inserts a row
        that explicitly specifies the column value, and the value is
        bigger than the current counter value, the counter is set to the
        specified column value.
      
        You may see gaps in the sequence of values assigned to the
        AUTO_INCREMENT column if you roll back
        transactions that have generated numbers using the counter.
      
        If a user specifies NULL or
        0 for the AUTO_INCREMENT
        column in an INSERT,
        InnoDB treats the row as if the value had not
        been specified and generates a new value for it.
      
The behavior of the auto-increment mechanism is not defined if a user assigns a negative value to the column or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.
        When accessing the auto-increment counter,
        InnoDB uses a special table-level
        AUTO-INC lock that it keeps to the end of the
        current SQL statement, not to the end of the transaction. The
        special lock release strategy was introduced to improve
        concurrency for inserts into a table containing an
        AUTO_INCREMENT column. Nevertheless, two
        transactions cannot have the AUTO-INC lock on
        the same table simultaneously, which can have a performance
        impact if the AUTO-INC lock is held for a
        long time. That might be the case for a statement such as
        INSERT INTO t1 ... SELECT ... FROM t2 that
        inserts all rows from one table into another.
      
        InnoDB uses the in-memory auto-increment
        counter as long as he server runs. When the server is stopped
        and restarted, InnoDB reinitializes the
        counter for each table for the first INSERT
        to the table, as described earlier.
      
        Beginning with MySQL 5.0.3, InnoDB supports
        the AUTO_INCREMENT =
         table option in
        NCREATE TABLE and ALTER
        TABLE statements, to set the initial counter value or
        alter the current counter value. The effect of this option is
        canceled by a server restart, for reasons discussed earlier in
        this section.
      
        InnoDB also supports foreign key constraints.
        The syntax for a foreign key constraint definition in
        InnoDB looks like this:
      
[CONSTRAINTsymbol] FOREIGN KEY [id] (index_col_name, ...) REFERENCEStbl_name(index_col_name, ...) [ON DELETE {RESTRICT | CASCADE | SET NULL | NO ACTION}] [ON UPDATE {RESTRICT | CASCADE | SET NULL | NO ACTION}]
Foreign keys definitions are subject to the following conditions:
            Both tables must be InnoDB tables and
            they must not be TEMPORARY tables.
          
In the referencing table, there must be an index where the foreign key columns are listed as the first columns in the same order. Such an index is created on the referencing table automatically if it does not exist.
In the referenced table, there must be an index where the referenced columns are listed as the first columns in the same order.
            Index prefixes on foreign key columns are not supported. One
            consequence of this is that BLOB and
            TEXT columns cannot be included in a
            foreign key, because indexes on those columns must always
            include a prefix length.
          
            If the CONSTRAINT
             clause is given,
            the symbolsymbol value must be unique
            in the database. If the clause is not given,
            InnoDB creates the name automatically.
          
        InnoDB rejects any INSERT
        or UPDATE operation that attempts to create a
        foreign key value in a child table if there is no a matching
        candidate key value in the parent table. The action
        InnoDB takes for any
        UPDATE or DELETE operation
        that attempts to update or delete a candidate key value in the
        parent table that has some matching rows in the child table is
        dependent on the referential action
        specified using ON UPDATE and ON
        DELETE subclauses of the FOREIGN
        KEY clause. When the user attempts to delete or update
        a row from a parent table, and there are one or more matching
        rows in the child table, InnoDB supports five
        options regarding the action to be taken:
      
            CASCADE: Delete or update the row from
            the parent table and automatically delete or update the
            matching rows in the child table. Both ON DELETE
            CASCADE and ON UPDATE CASCADE
            are supported. Between two tables, you should not define
            several ON UPDATE CASCADE clauses that
            act on the same column in the parent table or in the child
            table.
          
            SET NULL: Delete or update the row from
            the parent table and set the foreign key column or columns
            in the child table to NULL. This is valid
            only if the foreign key columns do not have the NOT
            NULL qualifier specified. Both ON DELETE
            SET NULL and ON UPDATE SET NULL
            clauses are supported.
          
            NO ACTION: In standard SQL, NO
            ACTION means no action in the
            sense that an attempt to delete or update a primary key
            value is not allowed to proceed if there is a related
            foreign key value in the referenced table.
            InnoDB rejects the delete or update
            operation for the parent table.
          
            RESTRICT: Rejects the delete or update
            operation for the parent table. NO ACTION
            and RESTRICT are the same as omitting the
            ON DELETE or ON UPDATE
            clause. (Some database systems have deferred checks, and
            NO ACTION is a deferred check. In MySQL,
            foreign key constraints are checked immediately, so
            NO ACTION and RESTRICT
            are the same.)
          
            SET DEFAULT: This action is recognized by
            the parser, but InnoDB rejects table
            definitions containing ON DELETE SET
            DEFAULT or ON UPDATE SET
            DEFAULT clauses.
          
        Note that InnoDB supports foreign key
        references within a table. In these cases, “child table
        records” really refers to dependent records within the
        same table.
      
        InnoDB requires indexes on foreign keys and
        referenced keys so that foreign key checks can be fast and not
        require a table scan. The index on the foreign key is created
        automatically. This is in contrast to some older versions, in
        which indexes had to be created explicitly or the creation of
        foreign key constraints would fail.
      
        Corresponding columns in the foreign key and the referenced key
        must have similar internal data types inside
        InnoDB so that they can be compared without a
        type conversion. The size and sign of integer types
        must be the same. The length of string types need not
        be the same. If you specify a SET NULL
        action, make sure that you have not declared the
        columns in the child table as NOT
        NULL.
      
        If MySQL reports an error number 1005 from a CREATE
        TABLE statement, and the error message refers to errno
        150, table creation failed because a foreign key constraint was
        not correctly formed. Similarly, if an ALTER
        TABLE fails and it refers to errno 150, that means a
        foreign key definition would be incorrectly formed for the
        altered table. You can use SHOW ENGINE INNODB
        STATUS to display a detailed explanation of the most
        recent InnoDB foreign key error in the
        server.
      
        Note: InnoDB
        does not check foreign key constraints on those foreign key or
        referenced key values that contain a NULL
        column.
      
Note: Currently, triggers are not activated by cascaded foreign key actions.
        Deviation from SQL standards:
        If there are several rows in the parent table that have the same
        referenced key value, InnoDB acts in foreign
        key checks as if the other parent rows with the same key value
        do not exist. For example, if you have defined a
        RESTRICT type constraint, and there is a
        child row with several parent rows, InnoDB
        does not allow the deletion of any of those parent rows.
      
        InnoDB performs cascading operations through
        a depth-first algorithm, based on records in the indexes
        corresponding to the foreign key constraints.
      
        Deviation from SQL standards: A
        FOREIGN KEY constraint that references a
        non-UNIQUE key is not standard SQL. It is an
        InnoDB extension to standard SQL.
      
        Deviation from SQL standards:
        If ON UPDATE CASCADE or ON UPDATE
        SET NULL recurses to update the same
        table it has previously updated during the cascade,
        it acts like RESTRICT. This means that you
        cannot use self-referential ON UPDATE CASCADE
        or ON UPDATE SET NULL operations. This is to
        prevent infinite loops resulting from cascaded updates. A
        self-referential ON DELETE SET NULL, on the
        other hand, is possible, as is a self-referential ON
        DELETE CASCADE. Cascading operations may not be nested
        more than 15 levels deep.
      
        Deviation from SQL standards:
        Like MySQL in general, in an SQL statement that inserts,
        deletes, or updates many rows, InnoDB checks
        UNIQUE and FOREIGN KEY
        constraints row-by-row. According to the SQL standard, the
        default behavior should be deferred checking. That is,
        constraints are only checked after the entire SQL
        statement has been processed. Until
        InnoDB implements deferred constraint
        checking, some things will be impossible, such as deleting a
        record that refers to itself via a foreign key.
      
        Here is a simple example that relates parent
        and child tables through a single-column
        foreign key:
      
CREATE TABLE parent (id INT NOT NULL,
                     PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE child (id INT, parent_id INT,
                    INDEX par_ind (parent_id),
                    FOREIGN KEY (parent_id) REFERENCES parent(id)
                      ON DELETE CASCADE
) ENGINE=INNODB;
        A more complex example in which a
        product_order table has foreign keys for two
        other tables. One foreign key references a two-column index in
        the product table. The other references a
        single-column index in the customer table:
      
CREATE TABLE product (category INT NOT NULL, id INT NOT NULL,
                      price DECIMAL,
                      PRIMARY KEY(category, id)) ENGINE=INNODB;
CREATE TABLE customer (id INT NOT NULL,
                       PRIMARY KEY (id)) ENGINE=INNODB;
CREATE TABLE product_order (no INT NOT NULL AUTO_INCREMENT,
                            product_category INT NOT NULL,
                            product_id INT NOT NULL,
                            customer_id INT NOT NULL,
                            PRIMARY KEY(no),
                            INDEX (product_category, product_id),
                            FOREIGN KEY (product_category, product_id)
                              REFERENCES product(category, id)
                              ON UPDATE CASCADE ON DELETE RESTRICT,
                            INDEX (customer_id),
                            FOREIGN KEY (customer_id)
                              REFERENCES customer(id)) ENGINE=INNODB;
        InnoDB allows you to add a new foreign key
        constraint to a table by using ALTER TABLE:
      
ALTER TABLEtbl_nameADD [CONSTRAINTsymbol] FOREIGN KEY [id] (index_col_name, ...) REFERENCEStbl_name(index_col_name, ...) [ON DELETE {RESTRICT | CASCADE | SET NULL | NO ACTION}] [ON UPDATE {RESTRICT | CASCADE | SET NULL | NO ACTION}]
        Remember to create the required indexes
        first. You can also add a self-referential foreign
        key constraint to a table using ALTER TABLE.
      
        InnoDB also supports the use of
        ALTER TABLE to drop foreign keys:
      
ALTER TABLEtbl_nameDROP FOREIGN KEYfk_symbol;
        If the FOREIGN KEY clause included a
        CONSTRAINT name when you created the foreign
        key, you can refer to that name to drop the foreign key.
        Otherwise, the fk_symbol value is
        internally generated by InnoDB when the
        foreign key is created. To find out the symbol value when you
        want to drop a foreign key, use the SHOW CREATE
        TABLE statement. For example:
      
mysql>SHOW CREATE TABLE ibtest11c\G*************************** 1. row *************************** Table: ibtest11c Create Table: CREATE TABLE `ibtest11c` ( `A` int(11) NOT NULL auto_increment, `D` int(11) NOT NULL default '0', `B` varchar(200) NOT NULL default '', `C` varchar(175) default NULL, PRIMARY KEY (`A`,`D`,`B`), KEY `B` (`B`,`C`), KEY `C` (`C`), CONSTRAINT `0_38775` FOREIGN KEY (`A`, `D`) REFERENCES `ibtest11a` (`A`, `D`) ON DELETE CASCADE ON UPDATE CASCADE, CONSTRAINT `0_38776` FOREIGN KEY (`B`, `C`) REFERENCES `ibtest11a` (`B`, `C`) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=INNODB CHARSET=latin1 1 row in set (0.01 sec) mysql>ALTER TABLE ibtest11c DROP FOREIGN KEY `0_38775`;
        You cannot add a foreign key and drop a foreign key in separate
        clauses of a single ALTER TABLE statement.
        Separate statements are required.
      
        The InnoDB parser allows table and column
        identifiers in a FOREIGN KEY ... REFERENCES
        ... clause to be quoted within backticks.
        (Alternatively, double quotes can be used if the
        ANSI_QUOTES SQL mode is enabled.) The
        InnoDB parser also takes into account the
        setting of the lower_case_table_names system
        variable.
      
        InnoDB returns a table's foreign key
        definitions as part of the output of the SHOW CREATE
        TABLE statement:
      
SHOW CREATE TABLE tbl_name;
mysqldump also produces correct definitions of tables to the dump file, and does not forget about the foreign keys.
You can also display the foreign key constraints for a table like this:
SHOW TABLE STATUS FROMdb_nameLIKE 'tbl_name';
        The foreign key constraints are listed in the
        Comment column of the output.
      
        When performing foreign key checks, InnoDB
        sets shared row-level locks on child or parent records it has to
        look at. InnoDB checks foreign key
        constraints immediately; the check is not deferred to
        transaction commit.
      
        To make it easier to reload dump files for tables that have
        foreign key relationships, mysqldump
        automatically includes a statement in the dump output to set
        FOREIGN_KEY_CHECKS to 0. This avoids problems
        with tables having to be reloaded in a particular order when the
        dump is reloaded. It is also possible to set this variable
        manually:
      
mysql>SET FOREIGN_KEY_CHECKS = 0;mysql>SOURCEmysql>dump_file_name;SET FOREIGN_KEY_CHECKS = 1;
        This allows you to import the tables in any order if the dump
        file contains tables that are not correctly ordered for foreign
        keys. It also speeds up the import operation. Setting
        FOREIGN_KEY_CHECKS to 0 can also be useful
        for ignoring foreign key constraints during LOAD
        DATA and ALTER TABLE operations.
      
        InnoDB does not allow you to drop a table
        that is referenced by a FOREIGN KEY
        constraint, unless you do SET
        FOREIGN_KEY_CHECKS=0. When you drop a table, the
        constraints that were defined in its create statement are also
        dropped.
      
If you re-create a table that was dropped, it must have a definition that conforms to the foreign key constraints referencing it. It must have the right column names and types, and it must have indexes on the referenced keys, as stated earlier. If these are not satisfied, MySQL returns error number 1005 and refers to errno 150 in the error message.
        MySQL replication works for InnoDB tables as
        it does for MyISAM tables. It is also
        possible to use replication in a way where the storage engine on
        the slave is not the same as the original storage engine on the
        master. For example, you can replicate modifications to an
        InnoDB table on the master to a
        MyISAM table on the slave.
      
        To set up a new slave for a master, you have to make a copy of
        the InnoDB tablespace and the log files, as
        well as the .frm files of the
        InnoDB tables, and move the copies to the
        slave. If the innodb_file_per_table variable
        is enabled, you must also copy the .ibd
        files as well. For the proper procedure to do this, see
        Section 14.2.8, “Backing Up and Recovering an InnoDB Database”.
      
        If you can shut down the master or an existing slave, you can
        take a cold backup of the InnoDB tablespace
        and log files and use that to set up a slave. To make a new
        slave without taking down any server you can also use the
        non-free (commercial)
        InnoDB
        Hot Backup tool.
      
        You cannot set up replication for InnoDB
        using the LOAD TABLE FROM MASTER statement,
        which works only for MyISAM tables. There are
        two possible workarounds:
      
Dump the table on the master and import the dump file into the slave.
            Use ALTER TABLE  on the master before setting up
            replication with tbl_name
            ENGINE=MyISAMLOAD TABLE
            ,
            and then use tbl_name FROM MASTERALTER TABLE to convert the
            master table back to InnoDB afterward.
            However, this should not be done for tables that have
            foreign key definitions because the definitions will be
            lost.
          
        Transactions that fail on the master do not affect replication
        at all. MySQL replication is based on the binary log where MySQL
        writes SQL statements that modify data. A transaction that fails
        (for example, because of a foreign key violation, or because it
        is is rolled back) is not written to the binary log, so it is
        not sent to slaves. See Section 13.4.1, “START TRANSACTION, COMMIT, and ROLLBACK Syntax”.
      
      This section describes what you can do when your
      InnoDB tablespace runs out of room or when you
      want to change the size of the log files.
    
      The easiest way to increase the size of the
      InnoDB tablespace is to configure it from the
      beginning to be auto-extending. Specify the
      autoextend attribute for the last data file in
      the tablespace definition. Then InnoDB
      increases the size of that file automatically in 8MB increments
      when it runs out of space. The increment size can be changed by
      setting the value of the
      innodb_autoextend_increment system variable,
      which is measured in MB.
    
      Alternatively, you can increase the size of your tablespace by
      adding another data file. To do this, you have to shut down the
      MySQL server, change the tablespace configuration to add a new
      data file to the end of innodb_data_file_path,
      and start the server again.
    
      If your last data file was defined with the keyword
      autoextend, the procedure for reconfiguring the
      tablespace must take into account the size to which the last data
      file has grown. Obtain the size of the data file, round it down to
      the closest multiple of 1024 × 1024 bytes (= 1MB), and
      specify the rounded size explicitly in
      innodb_data_file_path. Then you can add another
      data file. Remember that only the last data file in the
      innodb_data_file_path can be specified as
      auto-extending.
    
      As an example, assume that the tablespace has just one
      auto-extending data file ibdata1:
    
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:10M:autoextend
Suppose that this data file, over time, has grown to 988MB. Here is the configuration line after modifying the original data file to not be auto-extending and adding another auto-extending data file:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend
      When you add a new file to the tablespace configuration, make sure
      that it does not exist. InnoDB will create and
      initialize the file when you restart the server.
    
Currently, you cannot remove a data file from the tablespace. To decrease the size of your tablespace, use this procedure:
          Use mysqldump to dump all your
          InnoDB tables.
        
Stop the server.
Remove all the existing tablespace files.
Configure a new tablespace.
Restart the server.
Import the dump files.
      If you want to change the number or the size of your
      InnoDB log files, you have to stop the MySQL
      server and make sure that it shuts down without errors (to ensure
      that there is no information for outstanding transactions in the
      logs). Then copy the old log files into a safe place just in case
      something went wrong in the shutdown and you need them to recover
      the tablespace. Delete the old log files from the log file
      directory, edit my.cnf to change the log file
      configuration, and start the MySQL server again.
      mysqld sees that no log files exist at startup
      and tells you that it is creating new ones.
    
The key to safe database management is making regular backups.
      InnoDB Hot Backup is an online backup tool you
      can use to backup your InnoDB database while it
      is running. InnoDB Hot Backup does not require
      you to shut down your database and it does not set any locks or
      disturb your normal database processing. InnoDB Hot
      Backup is a non-free (commercial) add-on tool with an
      annual license fee of €390 per computer on which the MySQL
      server is run. See the
      InnoDB Hot
      Backup home page for detailed information and
      screenshots.
    
      If you are able to shut down your MySQL server, you can make a
      binary backup that consists of all files used by
      InnoDB to manage its tables. Use the following
      procedure:
    
Shut down your MySQL server and make sure that it shuts down without errors.
          Copy all your data files (ibdata files
          and .ibd files) into a safe place.
        
          Copy all your ib_logfile files to a safe
          place.
        
          Copy your my.cnf configuration file or
          files to a safe place.
        
          Copy all the .frm files for your
          InnoDB tables to a safe place.
        
      Replication works with InnoDB tables, so you
      can use MySQL replication capabilities to keep a copy of your
      database at database sites requiring high availability.
    
      In addition to making binary backups as just described, you should
      also regularly make dumps of your tables with
      mysqldump. The reason for this is that a binary
      file might be corrupted without you noticing it. Dumped tables are
      stored into text files that are human-readable, so spotting table
      corruption becomes easier. Also, because the format is simpler,
      the chance for serious data corruption is smaller.
      mysqldump also has a
      --single-transaction option that you can use to
      make a consistent snapshot without locking out other clients.
    
      To be able to recover your InnoDB database to
      the present from the binary backup just described, you have to run
      your MySQL server with binary logging turned on. Then you can
      apply the binary log to the backup database to achieve
      point-in-time recovery:
    
mysqlbinlog yourhostname-bin.123 | mysql
      To recover from a crash of your MySQL server, the only requirement
      is to restart it. InnoDB automatically checks
      the logs and performs a roll-forward of the database to the
      present. InnoDB automatically rolls back
      uncommitted transactions that were present at the time of the
      crash. During recovery, mysqld displays output
      something like this:
    
InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 13674004 InnoDB: Doing recovery: scanned up to log sequence number 0 13739520 InnoDB: Doing recovery: scanned up to log sequence number 0 13805056 InnoDB: Doing recovery: scanned up to log sequence number 0 13870592 InnoDB: Doing recovery: scanned up to log sequence number 0 13936128 ... InnoDB: Doing recovery: scanned up to log sequence number 0 20555264 InnoDB: Doing recovery: scanned up to log sequence number 0 20620800 InnoDB: Doing recovery: scanned up to log sequence number 0 20664692 InnoDB: 1 uncommitted transaction(s) which must be rolled back InnoDB: Starting rollback of uncommitted transactions InnoDB: Rolling back trx no 16745 InnoDB: Rolling back of trx no 16745 completed InnoDB: Rollback of uncommitted transactions completed InnoDB: Starting an apply batch of log records to the database... InnoDB: Apply batch completed InnoDB: Started mysqld: ready for connections
If your database gets corrupted or your disk fails, you have to do the recovery from a backup. In the case of corruption, you should first find a backup that is not corrupted. After restoring the base backup, do the recovery from the binary log files using mysqlbinlog and mysql to restore the changes performed after the backup was made.
      In some cases of database corruption it is enough just to dump,
      drop, and re-create one or a few corrupt tables. You can use the
      CHECK TABLE SQL statement to check whether a
      table is corrupt, although CHECK TABLE
      naturally cannot detect every possible kind of corruption. You can
      use innodb_tablespace_monitor to check the
      integrity of the file space management inside the tablespace
      files.
    
In some cases, apparent database page corruption is actually due to the operating system corrupting its own file cache, and the data on disk may be okay. It is best first to try restarting your computer. Doing so may eliminate errors that appeared to be database page corruption.
        If there is database page corruption, you may want to dump your
        tables from the database with SELECT INTO
        OUTFILE. Usually, most of the data obtained in this
        way is intact. Even so, the corruption may cause SELECT
        * FROM  statements
        or tbl_nameInnoDB background operations to crash or
        assert, or even to cause InnoDB roll-forward
        recovery to crash. However, you can force the
        InnoDB storage engine to start up while
        preventing background operations from running, so that you are
        able to dump your tables. For example, you can add the following
        line to the [mysqld] section of your option
        file before restarting the server:
      
[mysqld] innodb_force_recovery = 4
        The allowable non-zero values for
        innodb_force_recovery follow. A larger number
        includes all precautions of smaller numbers. If you are able to
        dump your tables with an option value of at most 4, then you are
        relatively safe that only some data on corrupt individual pages
        is lost. A value of 6 is more drastic because database pages are
        left in an obsolete state, which in turn may introduce more
        corruption into B-trees and other database structures.
      
            1
            (SRV_FORCE_IGNORE_CORRUPT)
          
            Let the server run even if it detects a corrupt page. Try to
            make SELECT * FROM
             jump over
            corrupt index records and pages, which helps in dumping
            tables.
          tbl_name
            2
            (SRV_FORCE_NO_BACKGROUND)
          
Prevent the main thread from running. If a crash would occur during the purge operation, this recovery value prevents it.
            3
            (SRV_FORCE_NO_TRX_UNDO)
          
Do not run transaction rollbacks after recovery.
            4
            (SRV_FORCE_NO_IBUF_MERGE)
          
Prevent also insert buffer merge operations. If they would cause a crash, do not do them. Do not calculate table statistics.
            5
            (SRV_FORCE_NO_UNDO_LOG_SCAN)
          
            Do not look at undo logs when starting the database:
            InnoDB treats even incomplete
            transactions as committed.
          
            6
            (SRV_FORCE_NO_LOG_REDO)
          
Do not do the log roll-forward in connection with recovery.
        You can SELECT from tables to dump them, or
        DROP or CREATE tables even
        if forced recovery is used. If you know that a given table is
        causing a crash on rollback, you can drop it. You can also use
        this to stop a runaway rollback caused by a failing mass import
        or ALTER TABLE. You can kill the
        mysqld process and set
        innodb_force_recovery to 3
        to bring the database up without the rollback, then
        DROP the table that is causing the runaway
        rollback.
      
        The database must not otherwise be used with any
        non-zero value of
        innodb_force_recovery. As a safety
        measure, InnoDB prevents users from
        performing INSERT, UPDATE,
        or DELETE operations when
        innodb_force_recovery is greater than 0.
      
        InnoDB implements a checkpoint mechanism
        known as “fuzzy” checkpointing.
        InnoDB flushes modified database pages from
        the buffer pool in small batches. There is no need to flush the
        buffer pool in one single batch, which would in practice stop
        processing of user SQL statements during the checkpointing
        process.
      
        During crash recovery, InnoDB looks for a
        checkpoint label written to the log files. It knows that all
        modifications to the database before the label are present in
        the disk image of the database. Then InnoDB
        scans the log files forward from the checkpoint, applying the
        logged modifications to the database.
      
        InnoDB writes to its log files on a rotating
        basis. All committed modifications that make the database pages
        in the buffer pool different from the images on disk must be
        available in the log files in case InnoDB has
        to do a recovery. This means that when InnoDB
        starts to reuse a log file, it has to make sure that the
        database page images on disk contain the modifications logged in
        the log file that InnoDB is going to reuse.
        In other words, InnoDB must create a
        checkpoint and this often involves flushing of modified database
        pages to disk.
      
The preceding description explains why making your log files very large may save disk I/O in checkpointing. It often makes sense to set the total size of the log files as big as the buffer pool or even bigger. The drawback of using large log files is that crash recovery can take longer because there is more logged information to apply to the database.
      On Windows, InnoDB always stores database and
      table names internally in lowercase. To move databases in a binary
      format from Unix to Windows or from Windows to Unix, you should
      have all table and database names in lowercase. A convenient way
      to accomplish this is to add the following line to the
      [mysqld] section of your
      my.cnf or my.ini file
      before creating any databases or tables:
    
[mysqld] lower_case_table_names=1
      Like MyISAM data files,
      InnoDB data and log files are binary-compatible
      on all platforms having the same floating-point number format. You
      can move an InnoDB database simply by copying
      all the relevant files listed in Section 14.2.8, “Backing Up and Recovering an InnoDB Database”.
      If the floating-point formats differ but you have not used
      FLOAT or DOUBLE data types
      in your tables, then the procedure is the same: simply copy the
      relevant files. If the formats differ and your tables contain
      floating-point data, you must use mysqldump to
      dump your tables on one machine and then import the dump files on
      the other machine.
    
One way to increase performance is to switch off autocommit mode when importing data, assuming that the tablespace has enough space for the big rollback segment that the import transactions generate. Do the commit only after importing a whole table or a segment of a table.
InnoDB Lock ModesInnoDB and AUTOCOMMITInnoDB and TRANSACTION ISOLATION LEVELSELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE Locking ReadsInnoDBInnoDB
      In the InnoDB transaction model, the goal is to
      combine the best properties of a multi-versioning database with
      traditional two-phase locking. InnoDB does
      locking on the row level and runs queries as non-locking
      consistent reads by default, in the style of Oracle. The lock
      table in InnoDB is stored so space-efficiently
      that lock escalation is not needed: Typically several users are
      allowed to lock every row in the database, or any random subset of
      the rows, without InnoDB running out of memory.
    
        InnoDB implements standard row-level locking
        where there are two types of locks:
      
            A shared (S) lock allows a
            transaction to read a row (tuple).
          
            An exclusive (X) lock allows a
            transaction to update or delete a row.
          
        If transaction T1 holds a shared
        (S) lock on tuple
        t, then
      
            A request from some distinct transaction
            T2 for an S
            lock on t can be granted immediately. As
            a result, both T1 and
            T2 hold an S
            lock on t.
          
            A request from some distinct transaction
            T2 for an X
            lock on t cannot be granted immediately.
          
        If a transaction T1 holds an exclusive
        (X) lock on tuple
        t, then a request from some distinct
        transaction T2 for a lock of either type on
        t cannot be granted immediately. Instead,
        transaction T2 has to wait for transaction
        T1 to release its lock on tuple
        t.
      
        Additionally, InnoDB supports
        multiple granularity locking which allows
        coexistence of record locks and locks on entire tables. To make
        locking at multiple granularity levels practical, additional
        types of locks called intention locks are
        used. Intention locks are table locks in
        InnoDB. The idea behind intention locks is
        for a transaction to indicate which type of lock (shared or
        exclusive) it will require later for a row in that table. There
        are two types of intention locks used in
        InnoDB (assume that transaction
        T has requested a lock of the indicated type
        on table R):
      
            Intention shared (IS):
            Transaction T intends to set
            S locks on individual rows in
            table R.
          
            Intention exclusive (IX):
            Transaction T intends to set
            X locks on those rows.
          
The intention locking protocol is as follows:
            Before a given transaction can acquire an
            S lock on a given row, it must
            first acquire an IS or stronger
            lock on the table containing that row.
          
            Before a given transaction can acquire an
            X lock on a given row, it must
            first acquire an IX lock on the
            table containing that row.
          
These rules can be conveniently summarized by means of a lock type compatibility matrix:
| X | IX | S | IS | |
| X | Conflict | Conflict | Conflict | Conflict | 
| IX | Conflict | Compatible | Conflict | Compatible | 
| S | Conflict | Conflict | Compatible | Compatible | 
| IS | Conflict | Compatible | Compatible | Compatible | 
A lock is granted to a requesting transaction if it is compatible with existing locks. A lock is not granted to a requesting transaction if it conflicts with existing locks. A transaction waits until the conflicting existing lock is released. If a lock request conflicts with an existing lock and cannot be granted because it would cause deadlock, an error occurs.
        Thus, intention locks do not block anything except full table
        requests (for example, LOCK TABLES ...
        WRITE). The main purpose of
        IX and IS
        locks is to show that someone is locking a row, or going to lock
        a row in the table.
      
The following example illustrates how an error can occur when a lock request would cause a deadlock. The example involves two clients, A and B.
        First, client A creates a table containing one row, and then
        begins a transaction. Within the transaction, A obtains an
        S lock on the row by selecting it in
        share mode:
      
mysql>CREATE TABLE t (i INT) ENGINE = InnoDB;Query OK, 0 rows affected (1.07 sec) mysql>INSERT INTO t (i) VALUES(1);Query OK, 1 row affected (0.09 sec) mysql>START TRANSACTION;Query OK, 0 rows affected (0.00 sec) mysql>SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;+------+ | i | +------+ | 1 | +------+ 1 row in set (0.10 sec)
Next, client B begins a transaction and attempts to delete the row from the table:
mysql>START TRANSACTION;Query OK, 0 rows affected (0.00 sec) mysql>DELETE FROM t WHERE i = 1;
        The delete operation requires an X
        lock. The lock cannot be granted because it is incompatible with
        the S lock that client A holds, so
        the request goes on the queue of lock requests for the row and
        client B blocks.
      
Finally, client A also attempts to delete the row from the table:
mysql> DELETE FROM t WHERE i = 1;
ERROR 1213 (40001): Deadlock found when trying to get lock;
try restarting transaction
        Deadlock occurs here because client A needs an
        X lock to delete the row. However,
        that lock request cannot be granted because client B is already
        has a request for an X lock and is
        waiting for client A to release its S
        lock. Nor can the S lock held by A be
        upgraded to an X lock because of the
        prior request by B for an X lock. As
        a result, InnoDB generates an error for
        client A and releases its locks. At that point, the lock request
        for client B can be granted and B deletes the row from the
        table.
      
        In InnoDB, all user activity occurs inside a
        transaction. If the autocommit mode is enabled, each SQL
        statement forms a single transaction on its own. By default,
        MySQL starts new connections with autocommit enabled.
      
        If the autocommit mode is switched off with SET
        AUTOCOMMIT = 0, then we can consider that a user
        always has a transaction open. A SQL COMMIT
        or ROLLBACK statement ends the current
        transaction and a new one starts. A COMMIT
        means that the changes made in the current transaction are made
        permanent and become visible to other users. A
        ROLLBACK statement, on the other hand,
        cancels all modifications made by the current transaction. Both
        statements release all InnoDB locks that were
        set during the current transaction.
      
        If the connection has autocommit enabled, the user can still
        perform a multiple-statement transaction by starting it with an
        explicit START TRANSACTION or
        BEGIN statement and ending it with
        COMMIT or ROLLBACK.
      
        In terms of the SQL:1992 transaction isolation levels, the
        InnoDB default is REPEATABLE
        READ. InnoDB offers all four
        transaction isolation levels described by the SQL standard. You
        can set the default isolation level for all connections by using
        the --transaction-isolation option on the
        command line or in an option file. For example, you can set the
        option in the [mysqld] section of an option
        file like this:
      
[mysqld]
transaction-isolation = {READ-UNCOMMITTED | READ-COMMITTED
                         | REPEATABLE-READ | SERIALIZABLE}
        A user can change the isolation level for a single session or
        for all new incoming connections with the SET
        TRANSACTION statement. Its syntax is as follows:
      
SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL
                       {READ UNCOMMITTED | READ COMMITTED
                        | REPEATABLE READ | SERIALIZABLE}
        Note that there are hyphens in the level names for the
        --transaction-isolation option, but not for the
        SET TRANSACTION statement.
      
        The default behavior is to set the isolation level for the next
        (not started) transaction. If you use the
        GLOBAL keyword, the statement sets the
        default transaction level globally for all new connections
        created from that point on (but not for existing connections).
        You need the SUPER privilege to do this.
        Using the SESSION keyword sets the default
        transaction level for all future transactions performed on the
        current connection.
      
Any client is free to change the session isolation level (even in the middle of a transaction), or the isolation level for the next transaction.
        You can determine the global and session transaction isolation
        levels by checking the value of the
        tx_isolation system variable with these
        statements:
      
SELECT @@global.tx_isolation; SELECT @@tx_isolation;
        In row-level locking, InnoDB uses next-key
        locking. That means that besides index records,
        InnoDB can also lock the “gap”
        preceding an index record to block insertions by other users
        immediately before the index record. A next-key lock refers to a
        lock that locks an index record and the gap before it. A gap
        lock refers to a lock that only locks a gap before some index
        record.
      
        A detailed description of each isolation level in
        InnoDB follows:
      
            READ UNCOMMITTED
          
            SELECT statements are performed in a
            non-locking fashion, but a possible earlier version of a
            record might be used. Thus, using this isolation level, such
            reads are not consistent. This is also called a “dirty
            read.” Otherwise, this isolation level works like
            READ COMMITTED.
          
            READ COMMITTED
          
            A somewhat Oracle-like isolation level. All SELECT
            ... FOR UPDATE and SELECT ... LOCK IN
            SHARE MODE statements lock only the index records,
            not the gaps before them, and thus allow the free insertion
            of new records next to locked records.
            UPDATE and DELETE
            statements using a unique index with a unique search
            condition lock only the index record found, not the gap
            before it. In range-type UPDATE and
            DELETE statements,
            InnoDB must set next-key or gap locks and
            block insertions by other users to the gaps covered by the
            range. This is necessary because “phantom rows”
            must be blocked for MySQL replication and recovery to work.
          
Consistent reads behave as in Oracle: Each consistent read, even within the same transaction, sets and reads its own fresh snapshot. See Section 14.2.10.4, “Consistent Non-Locking Read”.
            REPEATABLE READ
          
            This is the default isolation level of
            InnoDB. SELECT ... FOR
            UPDATE, SELECT ... LOCK IN SHARE
            MODE, UPDATE, and
            DELETE statements that use a unique index
            with a unique search condition lock only the index record
            found, not the gap before it. With other search conditions,
            these operations employ next-key locking, locking the index
            range scanned with next-key or gap locks, and block new
            insertions by other users.
          
            In consistent reads, there is an important difference from
            the READ COMMITTED isolation level: All
            consistent reads within the same transaction read the same
            snapshot established by the first read. This convention
            means that if you issue several plain
            SELECT statements within the same
            transaction, these SELECT statements are
            consistent also with respect to each other. See
            Section 14.2.10.4, “Consistent Non-Locking Read”.
          
            SERIALIZABLE
          
            This level is like REPEATABLE READ, but
            InnoDB implicitly commits all plain
            SELECT statements to SELECT ...
            LOCK IN SHARE MODE.
          
        A consistent read means that InnoDB uses
        multi-versioning to present to a query a snapshot of the
        database at a point in time. The query see the changes made by
        those transactions that committed before that point of time, and
        no changes made by later or uncommitted transactions. The
        exception to this rule is that the query sees the changes made
        by earlier statements within the same transaction. Note that the
        exception to the rule causes the following anomaly: if you
        update some rows in a table, a SELECT will
        see the latest version of the updated rows, while it sees the
        old version of other rows. If other users simultaneously update
        the same table, the anomaly means that you may see the table in
        a state that never existed in the database.
      
        If you are running with the default REPEATABLE
        READ isolation level, all consistent reads within the
        same transaction read the snapshot established by the first such
        read in that transaction. You can get a fresher snapshot for
        your queries by committing the current transaction and after
        that issuing new queries.
      
        Consistent read is the default mode in which
        InnoDB processes SELECT
        statements in READ COMMITTED and
        REPEATABLE READ isolation levels. A
        consistent read does not set any locks on the tables it
        accesses, and therefore other users are free to modify those
        tables at the same time a consistent read is being performed on
        the table.
      
        Note that consistent read does not work over DROP
        TABLE and over ALTER TABLE.
        Consistent read does not work over DROP TABLE
        because MySQL can't use a table that has been dropped and
        InnoDB destroys the table. Consistent read
        does not work over ALTER TABLE because it is
        executed inside of the transaction that creates a new table and
        inserts rows from the old table to the new table. When you
        reissue the consistent read, it will not see any rows in the new
        table, because they were inserted in a transaction that is not
        visible in the snapshot read by the consistent read.
      
        In some circumstances, a consistent read is not convenient. For
        example, you might want to add a new row into your table
        child, and make sure that the child has a
        parent in table parent. The following example
        shows how to implement referential integrity in your application
        code.
      
        Suppose that you use a consistent read to read the table
        parent and indeed see the parent of the child
        in the table. Can you safely add the child row to table
        child? No, because it may happen that
        meanwhile some other user deletes the parent row from the table
        parent without you being aware of it.
      
        The solution is to perform the SELECT in a
        locking mode using LOCK IN SHARE MODE:
      
SELECT * FROM parent WHERE NAME = 'Jones' LOCK IN SHARE MODE;
        Performing a read in share mode means that we read the latest
        available data, and set a shared mode lock on the rows we read.
        A shared mode lock prevents others from updating or deleting the
        row we have read. Also, if the latest data belongs to a yet
        uncommitted transaction of another client connection, we wait
        until that transaction commits. After we see that the preceding
        query returns the parent 'Jones', we can
        safely add the child record to the child
        table and commit our transaction.
      
        Let us look at another example: We have an integer counter field
        in a table child_codes that we use to assign
        a unique identifier to each child added to table
        child. Obviously, using a consistent read or
        a shared mode read to read the present value of the counter is
        not a good idea because two users of the database may then see
        the same value for the counter, and a duplicate-key error occurs
        if two users attempt to add children with the same identifier to
        the table.
      
        Here, LOCK IN SHARE MODE is not a good
        solution because if two users read the counter at the same time,
        at least one of them ends up in deadlock when attempting to
        update the counter.
      
        In this case, there are two good ways to implement the reading
        and incrementing of the counter: (1) update the counter first by
        incrementing it by 1 and only after that read it, or (2) read
        the counter first with a lock mode FOR
        UPDATE, and increment after that. The latter approach
        can be implemented as follows:
      
SELECT counter_field FROM child_codes FOR UPDATE; UPDATE child_codes SET counter_field = counter_field + 1;
        A SELECT ... FOR UPDATE reads the latest
        available data, setting exclusive locks on each row it reads.
        Thus, it sets the same locks a searched SQL
        UPDATE would set on the rows.
      
        The preceding description is merely an example of how
        SELECT ... FOR UPDATE works. In MySQL, the
        specific task of generating a unique identifier actually can be
        accomplished using only a single access to the table:
      
UPDATE child_codes SET counter_field = LAST_INSERT_ID(counter_field + 1); SELECT LAST_INSERT_ID();
        The SELECT statement merely retrieves the
        identifier information (specific to the current connection). It
        does not access any table.
      
        Locks set by IN SHARE MODE and FOR
        UPDATE reads are released when the transaction is
        committed or rolled back.
      
        In row-level locking, InnoDB uses an
        algorithm called next-key locking.
        InnoDB performs the row-level locking in such
        a way that when it searches or scans an index of a table, it
        sets shared or exclusive locks on the index records it
        encounters. Thus, the row-level locks are actually index record
        locks.
      
        The locks InnoDB sets on index records also
        affect the “gap” before that index record. If a
        user has a shared or exclusive lock on record
        R in an index, another user cannot insert a
        new index record immediately before R in the
        index order. This locking of gaps is done to prevent the
        so-called “phantom problem.” Suppose that you want
        to read and lock all children from the child
        table having an identifier value greater than 100, with the
        intention of updating some column in the selected rows later:
      
SELECT * FROM child WHERE id > 100 FOR UPDATE;
        Suppose that there is an index on the id
        column. The query scans that index starting from the first
        record where id is bigger than 100. If the
        locks set on the index records would not lock out inserts made
        in the gaps, a new row might meanwhile be inserted to the table.
        If you execute the same SELECT within the
        same transaction, you would see a new row in the result set
        returned by the query. This is contrary to the isolation
        principle of transactions: A transaction should be able to run
        so that the data it has read does not change during the
        transaction. If we regard a set of rows as a data item, the new
        “phantom” child would violate this isolation
        principle.
      
        When InnoDB scans an index, it can also lock
        the gap after the last record in the index. Just that happens in
        the previous example: The locks set by InnoDB
        prevent any insert to the table where id
        would be bigger than 100.
      
You can use next-key locking to implement a uniqueness check in your application: If you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read prevents anyone meanwhile inserting a duplicate for your row. Thus, the next-key locking allows you to “lock” the non-existence of something in your table.
        Suppose that you are running in the default REPEATABLE
        READ isolation level. When you issue a consistent read
        (that is, an ordinary SELECT statement),
        InnoDB gives your transaction a timepoint
        according to which your query sees the database. If another
        transaction deletes a row and commits after your timepoint was
        assigned, you do not see the row as having been deleted. Inserts
        and updates are treated similarly.
      
        You can advance your timepoint by committing your transaction
        and then doing another SELECT.
      
This is called multi-versioned concurrency control.
               User A                 User B
           SET AUTOCOMMIT=0;      SET AUTOCOMMIT=0;
time
|          SELECT * FROM t;
|          empty set
|                                 INSERT INTO t VALUES (1, 2);
|
v          SELECT * FROM t;
           empty set
                                  COMMIT;
           SELECT * FROM t;
           empty set
           COMMIT;
           SELECT * FROM t;
           ---------------------
           |    1    |    2    |
           ---------------------
           1 row in set
In this example, user A sees the row inserted by B only when B has committed the insert and A has committed as well, so that the timepoint is advanced past the commit of B.
        If you want to see the “freshest” state of the
        database, you should use either the READ
        COMMITTED isolation level or a locking read:
      
SELECT * FROM t LOCK IN SHARE MODE;
        A locking read, an UPDATE, or a
        DELETE generally set record locks on every
        index record that is scanned in the processing of the SQL
        statement. It does not matter if there are
        WHERE conditions in the statement that would
        exclude the row. InnoDB does not remember the
        exact WHERE condition, but only knows which
        index ranges were scanned. The record locks are normally
        next-key locks that also block inserts to the “gap”
        immediately before the record.
      
        If the locks to be set are exclusive, InnoDB
        always retrieves also the clustered index record and sets a lock
        on it.
      
If you do not have indexes suitable for your statement and MySQL has to scan the whole table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily need to scan many rows.
        InnoDB sets specific types of locks as
        follows:
      
            SELECT ... FROM is a consistent read,
            reading a snapshot of the database and setting no locks
            unless the transaction isolation level is set to
            SERIALIZABLE. For
            SERIALIZABLE level, this sets shared
            next-key locks on the index records it encounters.
          
            SELECT ... FROM ... LOCK IN SHARE MODE
            sets shared next-key locks on all index records the read
            encounters.
          
            SELECT ... FROM ... FOR UPDATE sets
            exclusive next-key locks on all index records the read
            encounters.
          
            INSERT INTO ... VALUES (...) sets an
            exclusive lock on the inserted row. Note that this lock is
            not a next-key lock and does not prevent other users from
            inserting to the gap before the inserted row. If a
            duplicate-key error occurs, a shared lock on the duplicate
            index record is set.
          
            While initializing a previously specified
            AUTO_INCREMENT column on a table,
            InnoDB sets an exclusive lock on the end
            of the index associated with the
            AUTO_INCREMENT column. In accessing the
            auto-increment counter, InnoDB uses a
            specific table lock mode AUTO-INC where
            the lock lasts only to the end of the current SQL statement,
            not to the end of the entire transaction. Note that other
            clients cannot insert into the table while the
            AUTO-INC table lock is held; see
            Section 14.2.10.2, “InnoDB and AUTOCOMMIT”.
          
            InnoDB fetches the value of a previously
            initialized AUTO_INCREMENT column without
            setting any locks.
          
            INSERT INTO T SELECT ... FROM S WHERE ...
            sets an exclusive (non-next-key) lock on each row inserted
            into T. InnoDB sets
            shared next-key locks locks on S, unless
            innodb_locks_unsafe_for_binlog is
            enabled, in which case it does the search on
            S as a consistent read.
            InnoDB has to set locks in the latter
            case: In roll-forward recovery from a backup, every SQL
            statement has to be executed in exactly the same way it was
            done originally.
          
            CREATE TABLE ... SELECT ... performs the
            SELECT as a consistent read or with
            shared locks, as in the previous item.
          
            REPLACE is done like an insert if there
            is no collision on a unique key. Otherwise, an exclusive
            next-key lock is placed on the row that has to be updated.
          
            UPDATE ... WHERE ... sets an exclusive
            next-key lock on every record the search encounters.
          
            DELETE FROM ... WHERE ... sets an
            exclusive next-key lock on every record the search
            encounters.
          
            If a FOREIGN KEY constraint is defined on
            a table, any insert, update, or delete that requires the
            constraint condition to be checked sets shared record-level
            locks on the records that it looks at to check the
            constraint. InnoDB also sets these locks
            in the case where the constraint fails.
          
            LOCK TABLES sets table locks, but it is
            the higher MySQL layer above the InnoDB
            layer that sets these locks. InnoDB is
            aware of table locks if
            innodb_table_locks=1 (the default) and
            AUTOCOMMIT=0, and the MySQL layer above
            InnoDB knows about row-level locks.
            Otherwise, InnoDB's automatic deadlock
            detection cannot detect deadlocks where such table locks are
            involved. Also, because the higher MySQL layer does not know
            about row-level locks, it is possible to get a table lock on
            a table where another user currently has row-level locks.
            However, this does not endanger transaction integrity, as
            discussed in Section 14.2.10.10, “Deadlock Detection and Rollback”.
            See also Section 14.2.16, “Restrictions on InnoDB Tables”.
          
        By default, MySQL begins each client connection with autocommit
        mode enabled. When autocommit is enabled, MySQL does a commit
        after each SQL statement if that statement did not return an
        error. If an SQL statement returns an error, the commit or
        rollback behavior depends on the error. See
        Section 14.2.15, “InnoDB Error Handling”.
      
If you have the autocommit mode off and close a connection without explicitly committing the final transaction, MySQL rolls back that transaction.
        Each of the following statements (and any synonyms for them)
        implicitly end a transaction, as if you had done a
        COMMIT before executing the statement:
      
            ALTER FUNCTION, ALTER
            PROCEDURE, ALTER TABLE,
            BEGIN, CREATE
            DATABASE, CREATE FUNCTION,
            CREATE INDEX, CREATE
            PROCEDURE, CREATE TABLE,
            DROP DATABASE, DROP
            FUNCTION, DROP INDEX,
            DROP PROCEDURE, DROP
            TABLE, LOAD MASTER DATA,
            LOCK TABLES, RENAME
            TABLE, SET AUTOCOMMIT=1,
            START TRANSACTION,
            TRUNCATE, UNLOCK
            TABLES.
          
            UNLOCK TABLES commits a transaction only
            if any tables are currently locked.
          
            The CREATE TABLE, CREATE
            DATABASE DROP DATABASE, and
            TRUNCATE TABLE statements cause an
            implicit commit beginning with MySQL 5.0.8. The
            ALTER FUNCTION, ALTER
            PROCEDURE, CREATE FUNCTION,
            CREATE PROCEDURE, DROP
            FUNCTION, and DROP PROCEDURE
            statements cause an implicit commit beginning with MySQL
            MySQL 5.0.13.
          
            The CREATE TABLE statement in
            InnoDB is processed as a single
            transaction. This means that a ROLLBACK
            from the user does not undo CREATE TABLE
            statements the user made during that transaction.
          
        Transactions cannot be nested. This is a consequence of the
        implicit COMMIT performed for any current
        transaction when you issue a START
        TRANSACTION statement or one of its synonyms.
      
        InnoDB automatically detects a deadlock of
        transactions and rolls back a transaction or transactions to
        break the deadlock. InnoDB tries to pick
        small transactions to roll back, where the size of a transaction
        is determined by the number of rows inserted, updated, or
        deleted.
      
        InnoDB is aware of table locks if
        innodb_table_locks=1 (the default) and
        AUTOCOMMIT=0, and the MySQL layer above it
        knows about row-level locks. Otherwise,
        InnoDB cannot detect deadlocks where a table
        lock set by a MySQL LOCK TABLES statement or
        a lock set by a storage engine other than
        InnoDB is involved. You must resolve these
        situations by setting the value of the
        innodb_lock_wait_timeout system variable.
      
        When InnoDB performs a complete rollback of a
        transaction, all locks set by the transaction are released.
        However, if just a single SQL statement is rolled back as a
        result of an error, some of the locks set by the statement may
        be preserved. This happens because InnoDB
        stores row locks in a format such that it cannot know afterward
        which lock was set by which statement.
      
Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
        InnoDB uses automatic row-level locking. You
        can get deadlocks even in the case of transactions that just
        insert or delete a single row. That is because these operations
        are not really “atomic”; they automatically set
        locks on the (possibly several) index records of the row
        inserted or deleted.
      
You can cope with deadlocks and reduce the likelihood of their occurrence with the following techniques:
            Use SHOW ENGINE INNODB STATUS to
            determine the cause of the latest deadlock. That can help
            you to tune your application to avoid deadlocks.
          
Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again.
Commit your transactions often. Small transactions are less prone to collision.
            If you are using locking reads (SELECT ... FOR
            UPDATE or ... LOCK IN SHARE
            MODE), try using a lower isolation level such as
            READ COMMITTED.
          
Access your tables and rows in a fixed order. Then transactions form well-defined queues and do not deadlock.
            Add well-chosen indexes to your tables. Then your queries
            need to scan fewer index records and consequently set fewer
            locks. Use EXPLAIN SELECT to determine
            which indexes the MySQL server regards as the most
            appropriate for your queries.
          
            Use less locking. If you can afford to allow a
            SELECT to return data from an old
            snapshot, do not add the clause FOR
            UPDATE or LOCK IN SHARE MODE to
            it. Using the READ COMMITTED isolation
            level is good here, because each consistent read within the
            same transaction reads from its own fresh snapshot.
          
            If nothing else helps, serialize your transactions with
            table-level locks. The correct way to use LOCK
            TABLES with transactional tables, such as
            InnoDB tables, is to set
            AUTOCOMMIT = 0 and not to call
            UNLOCK TABLES until after you commit the
            transaction explicitly. For example, if you need to write to
            table t1 and read from table
            t2, you can do this:
          
SET AUTOCOMMIT=0;
LOCK TABLES t1 WRITE, t2 READ, ...;
... do something with tables t1 and t2 here ...
COMMIT;
UNLOCK TABLES;
Table-level locks make your transactions queue nicely, and deadlocks are avoided.
            Another way to serialize transactions is to create an
            auxiliary “semaphore” table that contains just
            a single row. Have each transaction update that row before
            accessing other tables. In that way, all transactions happen
            in a serial fashion. Note that the InnoDB
            instant deadlock detection algorithm also works in this
            case, because the serializing lock is a row-level lock. With
            MySQL table-level locks, the timeout method must be used to
            resolve deadlocks.
          
            In applications that use the LOCK TABLES
            command, MySQL does not set InnoDB table
            locks if AUTOCOMMIT=1.
          
          In InnoDB, having a long PRIMARY
          KEY wastes a lot of disk space because its value
          must be stored with every secondary index record. (See
          Section 14.2.13, “InnoDB Table and Index Structures”.) Create an
          AUTO_INCREMENT column as the primary key if
          your primary key is long.
        
          If the Unix top tool or the Windows Task
          Manager shows that the CPU usage percentage with your workload
          is less than 70%, your workload is probably disk-bound. Maybe
          you are making too many transaction commits, or the buffer
          pool is too small. Making the buffer pool bigger can help, but
          do not set it equal to more than 80% of physical memory.
        
          Wrap several modifications into one transaction.
          InnoDB must flush the log to disk at each
          transaction commit if that transaction made modifications to
          the database. The rotation speed of a disk is typically at
          most 167 revolutions/second, which constrains the number of
          commits to the same 167th of a
          second if the disk does not “fool” the operating
          system.
        
          If you can afford the loss of some of the latest committed
          transactions if a crash occurs, you can set the
          innodb_flush_log_at_trx_commit parameter to
          0. InnoDB tries to flush the log once per
          second anyway, although the flush is not guaranteed.
        
          Make your log files big, even as big as the buffer pool. When
          InnoDB has written the log files full, it
          has to write the modified contents of the buffer pool to disk
          in a checkpoint. Small log files cause many unnecessary disk
          writes. The drawback of big log files is that the recovery
          time is longer.
        
Make the log buffer quite large as well (on the order of 8MB).
          Use the VARCHAR data type instead of
          CHAR if you are storing variable-length
          strings or if the column may contain many
          NULL values. A
          CHAR( column
          always takes N)N characters to store
          data, even if the string is shorter or its value is
          NULL. Smaller tables fit better in the
          buffer pool and reduce disk I/O.
        
          When using row_format=compact (the default
          InnoDB record format in MySQL
          5.0) and variable-length character sets, such as
          utf8 or sjis,
          CHAR( will
          occupy a variable amount of space, at least
          N)N bytes.
        
          In some versions of GNU/Linux and Unix, flushing files to disk
          with the Unix fsync() call (which
          InnoDB uses by default) and other similar
          methods is surprisingly slow. If you are dissatisfied with
          database write performance, you might try setting the
          innodb_flush_method parameter to
          O_DSYNC. Although
          O_DSYNC seems to be slower on most systems,
          yours might not be one of them.
        
          When using the InnoDB storage engine on
          Solaris 10 for x86_64 architecture (AMD Opteron), it is
          important to mount any filesystems used for storing
          InnoDB-related files using the
          forcedirectio option. (The default on
          Solaris 10/x86_64 is not to use this
          option.) Failure to use forcedirectio
          causes a serious degradation of InnoDB's
          speed and performance on this platform.
        
          When using the InnoDB storage engine with a
          large innodb_buffer_pool_size value on any
          release of Solaris 2.6 and up and any platform
          (sparc/x86/x64/amd64), a significant performance gain can be
          achieved by placing InnoDB data files and
          log files on raw devices or on a separate direct I/O UFS
          filesystem (using mount option
          forcedirectio; see
          mount_ufs(1M)). Users of the Veritas
          filesystem VxFS should use the mount option
          convosync=direct.
        
          Other MySQL data files, such as those for
          MyISAM tables, should not be placed on a
          direct I/O filesystem. Executables or libraries must
          not be placed on a direct I/O filesystem.
        
          When importing data into InnoDB, make sure
          that MySQL does not have autocommit mode enabled because that
          requires a log flush to disk for every insert. To disable
          autocommit during your import operation, surround it with
          SET AUTOCOMMIT and
          COMMIT statements:
        
SET AUTOCOMMIT=0;
... SQL import statements ...
COMMIT;
          If you use the mysqldump option
          --opt, you get dump files that are fast to
          import into an InnoDB table, even without
          wrapping them with the SET AUTOCOMMIT and
          COMMIT statements.
        
          Beware of big rollbacks of mass inserts:
          InnoDB uses the insert buffer to save disk
          I/O in inserts, but no such mechanism is used in a
          corresponding rollback. A disk-bound rollback can take 30
          times as long to perform as the corresponding insert. Killing
          the database process does not help because the rollback starts
          again on server startup. The only way to get rid of a runaway
          rollback is to increase the buffer pool so that the rollback
          becomes CPU-bound and runs fast, or to use a special
          procedure. See Section 14.2.8.1, “Forcing InnoDB Recovery”.
        
          Beware also of other big disk-bound operations. Use
          DROP TABLE and CREATE
          TABLE to empty a table, not DELETE FROM
          .
        tbl_name
          Use the multiple-row INSERT syntax to
          reduce communication overhead between the client and the
          server if you need to insert many rows:
        
INSERT INTO yourtable VALUES (1,2), (5,5), ...;
          This tip is valid for inserts into any table, not just
          InnoDB tables.
        
          If you have UNIQUE constraints on secondary
          keys, you can speed up table imports by temporarily turning
          off the uniqueness checks during the import session:
        
SET UNIQUE_CHECKS=0;
... import operation ...
SET UNIQUE_CHECKS=1;
          For big tables, this saves a lot of disk I/O because
          InnoDB can use its insert buffer to write
          secondary index records in a batch.
        
          If you have FOREIGN KEY constraints in your
          tables, you can speed up table imports by turning the foreign
          key checks off for the duration of the import session:
        
SET FOREIGN_KEY_CHECKS=0;
... import operation ...
SET FOREIGN_KEY_CHECKS=1;
For big tables, this can save a lot of disk I/O.
If you often have recurring queries for tables that are not updated frequently, use the query cache:
[mysqld] query_cache_type = ON query_cache_size = 10M
        InnoDB includes InnoDB
        Monitors that print information about the
        InnoDB internal state. You can use the
        SHOW ENGINE INNODB STATUS SQL statement at
        any time to fetch the output of the standard
        InnoDB Monitor to your SQL client. This
        information is useful in performance tuning. (If you are using
        the mysql interactive SQL client, the output
        is more readable if you replace the usual semicolon statement
        terminator with \G.) For a discussion of
        InnoDB lock modes, see
        Section 14.2.10.1, “InnoDB Lock Modes”.
      
mysql> SHOW ENGINE INNODB STATUS\G
        Another way to use InnoDB Monitors is to let
        them periodically write data to the standard output of the
        mysqld server. In this case, no output is
        sent to clients. When switched on, InnoDB
        Monitors print data about every 15 seconds. Server output
        usually is directed to the .err log in the
        MySQL data directory. This data is useful in performance tuning.
        On Windows, you must start the server from a command prompt in a
        console window with the --console option if you
        want to direct the output to the window rather than to the error
        log.
      
Monitor output includes the following types of information:
Table and record locks held by each active transaction
Lock waits of a transactions
Semaphore waits of threads
Pending file I/O requests
Buffer pool statistics
            Purge and insert buffer merge activity of the main
            InnoDB thread
          
        To cause the standard InnoDB Monitor to write
        to the standard output of mysqld, use the
        following SQL statement:
      
CREATE TABLE innodb_monitor (a INT) ENGINE=INNODB;
The monitor can be stopped by issuing the following statement:
DROP TABLE innodb_monitor;
        The CREATE TABLE syntax is just a way to pass
        a command to the InnoDB engine through
        MySQL's SQL parser: The only things that matter are the table
        name innodb_monitor and that it be an
        InnoDB table. The structure of the table is
        not relevant at all for the InnoDB Monitor.
        If you shut down the server, the monitor does not restart
        automatically when you restart the server. You must drop the
        monitor table and issue a new CREATE TABLE
        statement to start the monitor. (This syntax may change in a
        future release.)
      
        You can use innodb_lock_monitor in a similar
        fashion. This is the same as innodb_monitor,
        except that it also provides a great deal of lock information. A
        separate innodb_tablespace_monitor prints a
        list of created file segments existing in the tablespace and
        validates the tablespace allocation data structures. In
        addition, there is innodb_table_monitor with
        which you can print the contents of the
        InnoDB internal data dictionary.
      
        A sample of InnoDB Monitor output:
      
mysql> SHOW ENGINE INNODB STATUS\G
*************************** 1. row ***************************
Status:
=====================================
030709 13:00:59 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 18 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 413452, signal count 378357
--Thread 32782 has waited at btr0sea.c line 1477 for 0.00 seconds the
semaphore: X-lock on RW-latch at 41a28668 created in file btr0sea.c line 135
a writer (thread id 32782) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1
Last time read locked in file btr0sea.c line 731
Last time write locked in file btr0sea.c line 1347
Mutex spin waits 0, rounds 0, OS waits 0
RW-shared spins 108462, OS waits 37964; RW-excl spins 681824, OS waits
375485
------------------------
LATEST FOREIGN KEY ERROR
------------------------
030709 13:00:59 Transaction:
TRANSACTION 0 290328284, ACTIVE 0 sec, process no 3195, OS thread id 34831
inserting
15 lock struct(s), heap size 2496, undo log entries 9
MySQL thread id 25, query id 4668733 localhost heikki update
insert into ibtest11a (D, B, C) values (5, 'khDk' ,'khDk')
Foreign key constraint fails for table test/ibtest11a:
,
  CONSTRAINT `0_219242` FOREIGN KEY (`A`, `D`) REFERENCES `ibtest11b` (`A`,
  `D`) ON DELETE CASCADE ON UPDATE CASCADE
Trying to add in child table, in index PRIMARY tuple:
 0: len 4; hex 80000101; asc ....;; 1: len 4; hex 80000005; asc ....;; 2:
 len 4; hex 6b68446b; asc khDk;; 3: len 6; hex 0000114e0edc; asc ...N..;; 4:
 len 7; hex 00000000c3e0a7; asc .......;; 5: len 4; hex 6b68446b; asc khDk;;
But in parent table test/ibtest11b, in index PRIMARY,
the closest match we can find is record:
RECORD: info bits 0 0: len 4; hex 8000015b; asc ...[;; 1: len 4; hex
80000005; asc ....;; 2: len 3; hex 6b6864; asc khd;; 3: len 6; hex
0000111ef3eb; asc ......;; 4: len 7; hex 800001001e0084; asc .......;; 5:
len 3; hex 6b6864; asc khd;;
------------------------
LATEST DETECTED DEADLOCK
------------------------
030709 12:59:58
*** (1) TRANSACTION:
TRANSACTION 0 290252780, ACTIVE 1 sec, process no 3185, OS thread id 30733
inserting
LOCK WAIT 3 lock struct(s), heap size 320, undo log entries 146
MySQL thread id 21, query id 4553379 localhost heikki update
INSERT INTO alex1 VALUES(86, 86, 794,'aA35818','bb','c79166','d4766t',
'e187358f','g84586','h794',date_format('2001-04-03 12:54:22','%Y-%m-%d
%H:%i'),7
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 48310 n bits 568 table test/alex1 index
symbole trx id 0 290252780 lock mode S waiting
Record lock, heap no 324 RECORD: info bits 0 0: len 7; hex 61613335383138;
asc aa35818;; 1:
*** (2) TRANSACTION:
TRANSACTION 0 290251546, ACTIVE 2 sec, process no 3190, OS thread id 32782
inserting
130 lock struct(s), heap size 11584, undo log entries 437
MySQL thread id 23, query id 4554396 localhost heikki update
REPLACE INTO alex1 VALUES(NULL, 32, NULL,'aa3572','','c3572','d6012t','',
NULL,'h396', NULL, NULL, 7.31,7.31,7.31,200)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 48310 n bits 568 table test/alex1 index
symbole trx id 0 290251546 lock_mode X locks rec but not gap
Record lock, heap no 324 RECORD: info bits 0 0: len 7; hex 61613335383138;
asc aa35818;; 1:
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 48310 n bits 568 table test/alex1 index
symbole trx id 0 290251546 lock_mode X locks gap before rec insert intention
waiting
Record lock, heap no 82 RECORD: info bits 0 0: len 7; hex 61613335373230;
asc aa35720;; 1:
*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 0 290328385
Purge done for trx's n:o < 0 290315608 undo n:o < 0 17
Total number of lock structs in row lock hash table 70
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 3491, OS thread id 42002
MySQL thread id 32, query id 4668737 localhost heikki
show innodb status
---TRANSACTION 0 290328384, ACTIVE 0 sec, process no 3205, OS thread id
38929 inserting
1 lock struct(s), heap size 320
MySQL thread id 29, query id 4668736 localhost heikki update
insert into speedc values (1519229,1, 'hgjhjgghggjgjgjgjgjggjgjgjgjgjgggjgjg
jlhhgghggggghhjhghgggggghjhghghghghghhhhghghghjhhjghjghjkghjghjghjghjfhjfh
---TRANSACTION 0 290328383, ACTIVE 0 sec, process no 3180, OS thread id
28684 committing
1 lock struct(s), heap size 320, undo log entries 1
MySQL thread id 19, query id 4668734 localhost heikki update
insert into speedcm values (1603393,1, 'hgjhjgghggjgjgjgjgjggjgjgjgjgjgggjgj
gjlhhgghggggghhjhghgggggghjhghghghghghhhhghghghjhhjghjghjkghjghjghjghjfhjf
---TRANSACTION 0 290328327, ACTIVE 0 sec, process no 3200, OS thread id
36880 starting index read
LOCK WAIT 2 lock struct(s), heap size 320
MySQL thread id 27, query id 4668644 localhost heikki Searching rows for
update
update ibtest11a set B = 'kHdkkkk' where A = 89572
------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 65556 n bits 232 table test/ibtest11a index
PRIMARY trx id 0 290328327 lock_mode X waiting
Record lock, heap no 1 RECORD: info bits 0 0: len 9; hex 73757072656d756d00;
asc supremum.;;
------------------
---TRANSACTION 0 290328284, ACTIVE 0 sec, process no 3195, OS thread id
34831 rollback of SQL statement
ROLLING BACK 14 lock struct(s), heap size 2496, undo log entries 9
MySQL thread id 25, query id 4668733 localhost heikki update
insert into ibtest11a (D, B, C) values (5, 'khDk' ,'khDk')
---TRANSACTION 0 290327208, ACTIVE 1 sec, process no 3190, OS thread id
32782
58 lock struct(s), heap size 5504, undo log entries 159
MySQL thread id 23, query id 4668732 localhost heikki update
REPLACE INTO alex1 VALUES(86, 46, 538,'aa95666','bb','c95666','d9486t',
'e200498f','g86814','h538',date_format('2001-04-03 12:54:22','%Y-%m-%d
%H:%i'),
---TRANSACTION 0 290323325, ACTIVE 3 sec, process no 3185, OS thread id
30733 inserting
4 lock struct(s), heap size 1024, undo log entries 165
MySQL thread id 21, query id 4668735 localhost heikki update
INSERT INTO alex1 VALUES(NULL, 49, NULL,'aa42837','','c56319','d1719t','',
NULL,'h321', NULL, NULL, 7.31,7.31,7.31,200)
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
151671 OS file reads, 94747 OS file writes, 8750 OS fsyncs
25.44 reads/s, 18494 avg bytes/read, 17.55 writes/s, 2.33 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf for space 0: size 1, free list len 19, seg size 21,
85004 inserts, 85004 merged recs, 26669 merges
Hash table size 207619, used cells 14461, node heap has 16 buffer(s)
1877.67 hash searches/s, 5121.10 non-hash searches/s
---
LOG
---
Log sequence number 18 1212842764
Log flushed up to   18 1212665295
Last checkpoint at  18 1135877290
0 pending log writes, 0 pending chkp writes
4341 log i/o's done, 1.22 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 84966343; in additional pool allocated 1402624
Buffer pool size   3200
Free buffers       110
Database pages     3074
Modified db pages  2674
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 171380, created 51968, written 194688
28.72 reads/s, 20.72 creates/s, 47.55 writes/s
Buffer pool hit rate 999 / 1000
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
Main thread process no. 3004, id 7176, state: purging
Number of rows inserted 3738558, updated 127415, deleted 33707, read 755779
1586.13 inserts/s, 50.89 updates/s, 28.44 deletes/s, 107.88 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
Some notes on the output:
            If the TRANSACTIONS section reports lock
            waits, your applications may have lock contention. The
            output can also help to trace the reasons for transaction
            deadlocks.
          
            The SEMAPHORES section reports threads
            waiting for a semaphore and statistics on how many times
            threads have needed a spin or a wait on a mutex or a rw-lock
            semaphore. A large number of threads waiting for semaphores
            may be a result of disk I/O, or contention problems inside
            InnoDB. Contention can be due to heavy
            parallelism of queries or problems in operating system
            thread scheduling. Setting
            innodb_thread_concurrency smaller than
            the default value can help in such situations.
          
            The BUFFER POOL AND MEMORY section gives
            you statistics on pages read and written. You can calculate
            from these numbers how many data file I/O operations your
            queries currently are doing.
          
            The ROW OPERATIONS section shows what the
            main thread is doing.
          
        InnoDB sends diagnostic output to
        stderr or to files rather than to
        stdout or fixed-size memory buffers, to avoid
        potential buffer overflows. As a side effect, the output of
        SHOW ENGINE INNODB STATUS is written to a
        status file in the MySQL data directory every fifteen seconds.
        The name of the file is
        innodb_status.,
        where pidpid is the server process ID.
        InnoDB removes the file for a normal
        shutdown. If abnormal shutdowns have occurred, instances of
        these status files may be present and must be removed manually.
        Before removing them, you might want to examine them to see
        whether they contain useful information about the cause of
        abnormal shutdowns. The
        innodb_status.
        file is created only if the configuration option
        pidinnodb_status_file=1 is set.
      
      Because InnoDB is a multi-versioned storage
      engine, it must keep information about old versions of rows in the
      tablespace. This information is stored in a data structure called
      a rollback segment (after an analogous data
      structure in Oracle).
    
      Internally, InnoDB adds two fields to each row
      stored in the database. A 6-byte field indicates the transaction
      identifier for the last transaction that inserted or updated the
      row. Also, a deletion is treated internally as an update where a
      special bit in the row is set to mark it as deleted. Each row also
      contains a 7-byte field called the roll pointer. The roll pointer
      points to an undo log record written to the rollback segment. If
      the row was updated, the undo log record contains the information
      necessary to rebuild the content of the row before it was updated.
    
      InnoDB uses the information in the rollback
      segment to perform the undo operations needed in a transaction
      rollback. It also uses the information to build earlier versions
      of a row for a consistent read.
    
      Undo logs in the rollback segment are divided into insert and
      update undo logs. Insert undo logs are needed only in transaction
      rollback and can be discarded as soon as the transaction commits.
      Update undo logs are used also in consistent reads, but they can
      be discarded only after there is no transaction present for which
      InnoDB has assigned a snapshot that in a
      consistent read could need the information in the update undo log
      to build an earlier version of a database row.
    
      You must remember to commit your transactions regularly, including
      those transactions that issue only consistent reads. Otherwise,
      InnoDB cannot discard data from the update undo
      logs, and the rollback segment may grow too big, filling up your
      tablespace.
    
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space need for your rollback segment.
      In the InnoDB multi-versioning scheme, a row is
      not physically removed from the database immediately when you
      delete it with an SQL statement. Only when
      InnoDB can discard the update undo log record
      written for the deletion can it also physically remove the
      corresponding row and its index records from the database. This
      removal operation is called a purge, and it is quite fast, usually
      taking the same order of time as the SQL statement that did the
      deletion.
    
      In a scenario where the user inserts and deletes rows in smallish
      batches at about the same rate in the table, it is possible that
      the purge thread starts to lag behind, and the table grows bigger
      and bigger, making everything disk-bound and very slow. Even if
      the table carries just 10MB of useful data, it may grow to occupy
      10GB with all the “dead” rows. In such a case, it
      would be good to throttle new row operations, and allocate more
      resources to the purge thread. The
      innodb_max_purge_lag system variable exists for
      exactly this purpose. See Section 14.2.4, “InnoDB Startup Options and System Variables”, for
      more information.
    
      MySQL stores its data dictionary information for tables in
      .frm files in database directories. This is
      true for all MySQL storage engines. But every
      InnoDB table also has its own entry in the
      InnoDB internal data dictionary inside the
      tablespace. When MySQL drops a table or a database, it has to
      delete both an .frm file or files, and the
      corresponding entries inside the InnoDB data
      dictionary. This is the reason why you cannot move
      InnoDB tables between databases simply by
      moving the .frm files.
    
      Every InnoDB table has a special index called
      the clustered index where the data for the
      rows is stored. If you define a PRIMARY KEY on
      your table, the index of the primary key is the clustered index.
    
      If you do not define a PRIMARY KEY for your
      table, MySQL picks the first UNIQUE index that
      has only NOT NULL columns as the primary key
      and InnoDB uses it as the clustered index. If
      there is no such index in the table, InnoDB
      internally generates a clustered index where the rows are ordered
      by the row ID that InnoDB assigns to the rows
      in such a table. The row ID is a 6-byte field that increases
      monotonically as new rows are inserted. Thus, the rows ordered by
      the row ID are physically in insertion order.
    
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O when compared to the traditional solution. (In many database systems, data storage uses a different page from the index record.)
      In InnoDB, the records in non-clustered indexes
      (also called secondary indexes) contain the primary key value for
      the row. InnoDB uses this primary key value to
      search for the row from the clustered index. Note that if the
      primary key is long, the secondary indexes use more space.
    
      InnoDB compares CHAR and
      VARCHAR strings of different lengths such that
      the remaining length in the shorter string is treated as if padded
      with spaces.
    
        All InnoDB indexes are B-trees where the
        index records are stored in the leaf pages of the tree. The
        default size of an index page is 16KB. When new records are
        inserted, InnoDB tries to leave 1/16 of the
        page free for future insertions and updates of the index
        records.
      
        If index records are inserted in a sequential order (ascending
        or descending), the resulting index pages are about 15/16 full.
        If records are inserted in a random order, the pages are from
        1/2 to 15/16 full. If the fill factor of an index page drops
        below 1/2, InnoDB tries to contract the index
        tree to free the page.
      
It is a common situation in database applications that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus, the insertions to the clustered index do not require random reads from a disk.
        On the other hand, secondary indexes are usually non-unique, and
        insertions into secondary indexes happen in a relatively random
        order. This would cause a lot of random disk I/O operations
        without a special mechanism used in InnoDB.
      
        If an index record should be inserted to a non-unique secondary
        index, InnoDB checks whether the secondary
        index page is in the buffer pool. If that is the case,
        InnoDB does the insertion directly to the
        index page. If the index page is not found in the buffer pool,
        InnoDB inserts the record to a special insert
        buffer structure. The insert buffer is kept so small that it
        fits entirely in the buffer pool, and insertions can be done
        very fast.
      
Periodically, the insert buffer is merged into the secondary index trees in the database. Often it is possible to merge several insertions to the same page of the index tree, saving disk I/O operations. It has been measured that the insert buffer can speed up insertions into a table up to 15 times.
        The insert buffer merging may continue to happen
        after the inserting transaction has been
        committed. In fact, it may continue to happen after a server
        shutdown and restart (see Section 14.2.8.1, “Forcing InnoDB Recovery”).
      
The insert buffer merging may take many hours, when many secondary indexes must be updated, and many rows have been inserted. During this time, disk I/O will be increased, which can cause significant slowdown on disk-bound queries. Another significant background I/O operation is the purge thread (see Section 14.2.12, “Implementation of Multi-Versioning”).
        If a table fits almost entirely in main memory, the fastest way
        to perform queries on it is to use hash indexes.
        InnoDB has a mechanism that monitors index
        searches made to the indexes defined for a table. If
        InnoDB notices that queries could benefit
        from building a hash index, it does so automatically.
      
        Note that the hash index is always built based on an existing
        B-tree index on the table. InnoDB can build a
        hash index on a prefix of any length of the key defined for the
        B-tree, depending on the pattern of searches that
        InnoDB observes for the B-tree index. A hash
        index can be partial: It is not required that the whole B-tree
        index is cached in the buffer pool. InnoDB
        builds hash indexes on demand for those pages of the index that
        are often accessed.
      
        In a sense, InnoDB tailors itself through the
        adaptive hash index mechanism to ample main memory, coming
        closer to the architecture of main-memory databases.
      
        Records in InnoDB tables have the following
        characteristics:
      
Each index record contains a six-byte header. The header is used to link together consecutive records, and also in row-level locking.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte field for the transaction ID and a seven-byte field for the roll pointer.
If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.
Each secondary index record contains also all the fields defined for the clustered index key.
A record contains also a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
            Internally, InnoDB stores fixed-length
            character columns such as CHAR(10) in a
            fixed-length format. InnoDB truncates
            trailing spaces from VARCHAR columns.
          
            An SQL NULL value reserves 1 or 2 bytes
            in the record directory. Besides that, an SQL
            NULL value reserves zero bytes in the
            data part of the record if stored in a variable length
            column. In a fixed-length column, it reserves the fixed
            length of the column in the data part of the record. The
            motivation behind reserving the fixed space for
            NULL values is that it enables an update
            of the column from NULL to a
            non-NULL value to be done in place
            without causing fragmentation of the index page.
          
        InnoDB uses simulated asynchronous disk I/O:
        InnoDB creates a number of threads to take
        care of I/O operations, such as read-ahead.
      
        There are two read-ahead heuristics in
        InnoDB:
      
            In sequential read-ahead, if InnoDB
            notices that the access pattern to a segment in the
            tablespace is sequential, it posts in advance a batch of
            reads of database pages to the I/O system.
          
            In random read-ahead, if InnoDB notices
            that some area in a tablespace seems to be in the process of
            being fully read into the buffer pool, it posts the
            remaining reads to the I/O system.
          
        InnoDB uses a novel file flush technique
        called doublewrite. It adds safety to
        recovery following an operating system crash or a power outage,
        and improves performance on most varieties of Unix by reducing
        the need for fsync() operations.
      
        Doublewrite means that before writing pages to a data file,
        InnoDB first writes them to a contiguous
        tablespace area called the doublewrite buffer. Only after the
        write and the flush to the doublewrite buffer has completed does
        InnoDB write the pages to their proper
        positions in the data file. If the operating system crashes in
        the middle of a page write, InnoDB can later
        find a good copy of the page from the doublewrite buffer during
        recovery.
      
        The data files that you define in the configuration file form
        the tablespace of InnoDB. The files are
        simply concatenated to form the tablespace. There is no striping
        in use. Currently, you cannot define where within the tablespace
        your tables are allocated. However, in a newly created
        tablespace, InnoDB allocates space starting
        from the first data file.
      
        The tablespace consists of database pages with a default size of
        16KB. The pages are grouped into extents of 64 consecutive
        pages. The “files” inside a tablespace are called
        segments in InnoDB.
        The term “rollback segment” is somewhat confusing
        because it actually contains many tablespace segments.
      
        Two segments are allocated for each index in
        InnoDB. One is for non-leaf nodes of the
        B-tree, the other is for the leaf nodes. The idea here is to
        achieve better sequentiality for the leaf nodes, which contain
        the data.
      
        When a segment grows inside the tablespace,
        InnoDB allocates the first 32 pages to it
        individually. After that InnoDB starts to
        allocate whole extents to the segment. InnoDB
        can add to a large segment up to 4 extents at a time to ensure
        good sequentiality of data.
      
        Some pages in the tablespace contain bitmaps of other pages, and
        therefore a few extents in an InnoDB
        tablespace cannot be allocated to segments as a whole, but only
        as individual pages.
      
        When you ask for available free space in the tablespace by
        issuing a SHOW TABLE STATUS statement,
        InnoDB reports the extents that are
        definitely free in the tablespace. InnoDB
        always reserves some extents for cleanup and other internal
        purposes; these reserved extents are not included in the free
        space.
      
        When you delete data from a table, InnoDB
        contracts the corresponding B-tree indexes. Whether the freed
        space becomes available for other users depends on whether the
        pattern of deletes frees individual pages or extents to the
        tablespace. Dropping a table or deleting all rows from it is
        guaranteed to release the space to other users, but remember
        that deleted rows are physically removed only in an (automatic)
        purge operation after they are no longer needed for transaction
        rollbacks or consistent reads. (See
        Section 14.2.12, “Implementation of Multi-Versioning”.)
      
If there are random insertions into or deletions from the indexes of a table, the indexes may become fragmented. Fragmentation means that the physical ordering of the index pages on the disk is not close to the index ordering of the records on the pages, or that there are many unused pages in the 64-page blocks that were allocated to the index.
        A symptom of fragmentation is that a table takes more space than
        it “should” take. How much that is exactly, is
        difficult to determine. All InnoDB data and
        indexes are stored in B-trees, and their fill factor may vary
        from 50% to 100%. Another symptom of fragmentation is that a
        table scan such as this takes more time than it
        “should” take:
      
SELECT COUNT(*) FROM t WHERE a_non_indexed_column <> 12345;
(In the preceding query, we are “fooling” the SQL optimizer into scanning the clustered index, rather than a secondary index.) Most disks can read 10 to 50MB/s, which can be used to estimate how fast a table scan should run.
        It can speed up index scans if you periodically perform a
        “null” ALTER TABLE operation:
      
ALTER TABLE tbl_name ENGINE=INNODB
That causes MySQL to rebuild the table. Another way to perform a defragmentation operation is to use mysqldump to dump the table to a text file, drop the table, and reload it from the dump file.
        If the insertions to an index are always ascending and records
        are deleted only from the end, the InnoDB
        filespace management algorithm guarantees that fragmentation in
        the index does not occur.
      
      Error handling in InnoDB is not always the same
      as specified in the SQL standard. According to the standard, any
      error during an SQL statement should cause the rollback of that
      statement. InnoDB sometimes rolls back only
      part of the statement, or the whole transaction. The following
      items describe how InnoDB performs error
      handling:
    
          If you run out of file space in the tablespace, a MySQL
          Table is full error occurs and
          InnoDB rolls back the SQL statement.
        
          A transaction deadlock causes InnoDB to
          roll back the entire transaction. In the case of a lock wait
          timeout, InnoDB also rolls back the entire
          transaction before MySQL 5.0.13; as of 5.0.13,
          InnoDB rolls back only the most recent SQL
          statement.
        
          When a transaction rollback occurs due to a deadlock or lock
          wait timeout, it cancels the effect of the statements within
          the transaction. But if the start-transaction statement was
          START TRANSACTION or
          BEGIN statement, rollback does not cancel
          that statement. Further SQL statements become part of the
          transaction until the occurrence of COMMIT,
          ROLLBACK, or some SQL statement that causes
          an implicit commit.
        
          A duplicate-key error rolls back the SQL statement, if you
          have not specified the IGNORE option in
          your statement.
        
          A row too long error rolls back the SQL
          statement.
        
          Other errors are mostly detected by the MySQL layer of code
          (above the InnoDB storage engine level),
          and they roll back the corresponding SQL statement. Locks are
          not released in a rollback of a single SQL statement.
        
      During implicit rollbacks, as well as during the execution of an
      explicit ROLLBACK SQL command, SHOW
      PROCESSLIST displays Rolling back in
      the State column for the relevant connection.
    
        The following is a non-exhaustive list of common
        InnoDB-specific errors that you may
        encounter, with information about why each occurs and how to
        resolve the problem.
      
            1005 (ER_CANT_CREATE_TABLE)
          
            Cannot create table. If the error message refers to
            errno 150, table creation failed because
            a foreign key constraint was not correctly formed.
          
            1016 (ER_CANT_OPEN_FILE)
          
            Cannot find the InnoDB table from the
            InnoDB data files, although the
            .frm file for the table exists. See
            Section 14.2.17.1, “Troubleshooting InnoDB Data Dictionary Operations”.
          
            1114 (ER_RECORD_FILE_FULL)
          
            InnoDB has run out of free space in the
            tablespace. You should reconfigure the tablespace to add a
            new data file.
          
            1205 (ER_LOCK_WAIT_TIMEOUT)
          
Lock wait timeout expired. Transaction was rolled back.
            1213 (ER_LOCK_DEADLOCK)
          
Transaction deadlock. You should rerun the transaction.
            1216 (ER_NO_REFERENCED_ROW)
          
You are trying to add a row but there is no parent row, and a foreign key constraint fails. You should add the parent row first.
            1217 (ER_ROW_IS_REFERENCED)
          
You are trying to delete a parent row that has children, and a foreign key constraint fails. You should delete the children first.
To print the meaning of an operating system error number, use the perror program that comes with the MySQL distribution.
The following table provides a list of some common Linux system error codes. For a more complete list, see Linux source code.
            1 (EPERM)
          
Operation not permitted
            2 (ENOENT)
          
No such file or directory
            3 (ESRCH)
          
No such process
            4 (EINTR)
          
Interrupted system call
            5 (EIO)
          
I/O error
            6 (ENXIO)
          
No such device or address
            7 (E2BIG)
          
Arg list too long
            8 (ENOEXEC)
          
Exec format error
            9 (EBADF)
          
Bad file number
            10 (ECHILD)
          
No child processes
            11 (EAGAIN)
          
Try again
            12 (ENOMEM)
          
Out of memory
            13 (EACCES)
          
Permission denied
            14 (EFAULT)
          
Bad address
            15 (ENOTBLK)
          
Block device required
            16 (EBUSY)
          
Device or resource busy
            17 (EEXIST)
          
File exists
            18 (EXDEV)
          
Cross-device link
            19 (ENODEV)
          
No such device
            20 (ENOTDIR)
          
Not a directory
            21 (EISDIR)
          
Is a directory
            22 (EINVAL)
          
Invalid argument
            23 (ENFILE)
          
File table overflow
            24 (EMFILE)
          
Too many open files
            25 (ENOTTY)
          
Inappropriate ioctl for device
            26 (ETXTBSY)
          
Text file busy
            27 (EFBIG)
          
File too large
            28 (ENOSPC)
          
No space left on device
            29 (ESPIPE)
          
Illegal seek
            30 (EROFS)
          
Read-only file system
            31 (EMLINK)
          
Too many links
The following table provides a list of some common Windows system error codes. For a complete list see the Microsoft Web site.
            1 (ERROR_INVALID_FUNCTION)
          
Incorrect function.
            2 (ERROR_FILE_NOT_FOUND)
          
The system cannot find the file specified.
            3 (ERROR_PATH_NOT_FOUND)
          
The system cannot find the path specified.
            4 (ERROR_TOO_MANY_OPEN_FILES)
          
The system cannot open the file.
            5 (ERROR_ACCESS_DENIED)
          
Access is denied.
            6 (ERROR_INVALID_HANDLE)
          
The handle is invalid.
            7 (ERROR_ARENA_TRASHED)
          
The storage control blocks were destroyed.
            8 (ERROR_NOT_ENOUGH_MEMORY)
          
Not enough storage is available to process this command.
            9 (ERROR_INVALID_BLOCK)
          
The storage control block address is invalid.
            10 (ERROR_BAD_ENVIRONMENT)
          
The environment is incorrect.
            11 (ERROR_BAD_FORMAT)
          
An attempt was made to load a program with an incorrect format.
            12 (ERROR_INVALID_ACCESS)
          
The access code is invalid.
            13 (ERROR_INVALID_DATA)
          
The data is invalid.
            14 (ERROR_OUTOFMEMORY)
          
Not enough storage is available to complete this operation.
            15 (ERROR_INVALID_DRIVE)
          
The system cannot find the drive specified.
            16 (ERROR_CURRENT_DIRECTORY)
          
The directory cannot be removed.
            17 (ERROR_NOT_SAME_DEVICE)
          
The system cannot move the file to a different disk drive.
            18 (ERROR_NO_MORE_FILES)
          
There are no more files.
            19 (ERROR_WRITE_PROTECT)
          
The media is write protected.
            20 (ERROR_BAD_UNIT)
          
The system cannot find the device specified.
            21 (ERROR_NOT_READY)
          
The device is not ready.
            22 (ERROR_BAD_COMMAND)
          
The device does not recognize the command.
            23 (ERROR_CRC)
          
Data error (cyclic redundancy check).
            24 (ERROR_BAD_LENGTH)
          
The program issued a command but the command length is incorrect.
            25 (ERROR_SEEK)
          
The drive cannot locate a specific area or track on the disk.
            26 (ERROR_NOT_DOS_DISK)
          
The specified disk or diskette cannot be accessed.
            27 (ERROR_SECTOR_NOT_FOUND)
          
The drive cannot find the sector requested.
            28 (ERROR_OUT_OF_PAPER)
          
The printer is out of paper.
            29 (ERROR_WRITE_FAULT)
          
The system cannot write to the specified device.
            30 (ERROR_READ_FAULT)
          
The system cannot read from the specified device.
            31 (ERROR_GEN_FAILURE)
          
A device attached to the system is not functioning.
            32 (ERROR_SHARING_VIOLATION)
          
The process cannot access the file because it is being used by another process.
            33 (ERROR_LOCK_VIOLATION)
          
The process cannot access the file because another process has locked a portion of the file.
            34 (ERROR_WRONG_DISK)
          
The wrong diskette is in the drive. Insert %2 (Volume Serial Number: %3) into drive %1.
            36 (ERROR_SHARING_BUFFER_EXCEEDED)
          
Too many files opened for sharing.
            38 (ERROR_HANDLE_EOF)
          
Reached the end of the file.
            39 (ERROR_HANDLE_DISK_FULL)
          
The disk is full.
            87 (ERROR_INVALID_PARAMETER)
          
            The parameter is incorrect. (If this error occurs on Windows
            and you have enabled
            innodb_file_per_table in a server option
            file, add the line
            innodb_flush_method=unbuffered to the
            file as well.)
          
            112 (ERROR_DISK_FULL)
          
The disk is full.
            123 (ERROR_INVALID_NAME)
          
The filename, directory name, or volume label syntax is incorrect.
            1450 (ERROR_NO_SYSTEM_RESOURCES)
          
Insufficient system resources exist to complete the requested service.
          Warning: Do
          not convert MySQL system tables in the
          mysql database from
          MyISAM to InnoDB tables!
          This is an unsupported operation. If you do this, MySQL does
          not restart until you restore the old system tables from a
          backup or re-generate them with the
          mysql_install_db script.
        
A table cannot contain more than 1000 columns.
The internal maximum key length is 3500 bytes, but MySQL itself restricts this to 1024 bytes.
          The maximum row length, except for VARCHAR,
          BLOB and TEXT columns,
          is slightly less than half of a database page. That is, the
          maximum row length is about 8000 bytes.
          LONGBLOB and LONGTEXT
          columns must be less than 4GB, and the total row length,
          including also BLOB and
          TEXT columns, must be less than 4GB.
          InnoDB stores the first 768 bytes of a
          VARCHAR, BLOB, or
          TEXT column in the row, and the rest into
          separate pages.
        
          Although InnoDB supports row sizes larger
          than 65535 internally, you cannot define a row containing
          VARCHAR columns with a combined size larger
          than 65535:
        
mysql>CREATE TABLE t (a VARCHAR(8000), b VARCHAR(10000),->c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000),->f VARCHAR(10000), g VARCHAR(10000)) ENGINE=InnoDB;ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBs
          On some older operating systems, files must be less than 2GB.
          This is not a limitation of InnoDB itself,
          but if you require a large tablespace, you will need to
          configure it using several smaller data files rather than one
          or a file large data files.
        
          The combined size of the InnoDB log files
          must be less than 4GB.
        
The minimum tablespace size is 10MB. The maximum tablespace size is four billion database pages (64TB). This is also the maximum size for a table.
          InnoDB tables do not support
          FULLTEXT indexes.
        
          InnoDB tables do not support spatial data
          types before MySQL 5.0.16.
        
          ANALYZE TABLE determines index cardinality
          (as displayed in the Cardinality column of
          SHOW INDEX output) by doing eight random
          dives to each of the index trees and updating index
          cardinality estimates accordingly. Note that because these are
          only estimates, repeated runs of ANALYZE
          TABLE may produce different numbers. This makes
          ANALYZE TABLE fast on
          InnoDB tables but not 100% accurate as it
          doesn't take all rows into account.
        
          MySQL uses index cardinality estimates only in join
          optimization. If some join is not optimized in the right way,
          you can try using ANALYZE TABLE. In the few
          cases that ANALYZE TABLE doesn't produce
          values good enough for your particular tables, you can use
          FORCE INDEX with your queries to force the
          use of a particular index, or set the
          max_seeks_for_key system variable to ensure
          that MySQL prefers index lookups over table scans. See
          Section 5.2.2, “Server System Variables”, and
          Section A.6, “Optimizer-Related Issues”.
        
          SHOW TABLE STATUS does not give accurate
          statistics on InnoDB tables, except for the
          physical size reserved by the table. The row count is only a
          rough estimate used in SQL optimization.
        
          InnoDB does not keep an internal count of
          rows in a table. (In practice, this would be somewhat
          complicated due to multi-versioning.) To process a
          SELECT COUNT(*) FROM t statement,
          InnoDB must scan an index of the table,
          which takes some time if the index is not entirely in the
          buffer pool. To get a fast count, you have to use a counter
          table you create yourself and let your application update it
          according to the inserts and deletes it does. If your table
          does not change often, using the MySQL query cache is a good
          solution. SHOW TABLE STATUS also can be
          used if an approximate row count is sufficient. See
          Section 14.2.11, “InnoDB Performance Tuning Tips”.
        
          On Windows, InnoDB always stores database
          and table names internally in lowercase. To move databases in
          binary format from Unix to Windows or from Windows to Unix,
          you should always use explicitly lowercase names when creating
          databases and tables.
        
          For an AUTO_INCREMENT column, you must
          always define an index for the table, and that index must
          contain just the AUTO_INCREMENT column. In
          MyISAM tables, the
          AUTO_INCREMENT column may be part of a
          multi-column index.
        
          In MySQL 5.0 before MySQL 5.0.3,
          InnoDB does not support the
          AUTO_INCREMENT table option for setting the
          initial sequence value in a CREATE TABLE or
          ALTER TABLE statement. To set the value
          with InnoDB, insert a dummy row with a
          value one less and delete that dummy row, or insert the first
          row with an explicit value specified.
        
          While initializing a previously specified
          AUTO_INCREMENT column on a table,
          InnoDB sets an exclusive lock on the end of
          the index associated with the
          AUTO_INCREMENT column. In accessing the
          auto-increment counter, InnoDB uses a
          specific table lock mode AUTO-INC where the
          lock lasts only to the end of the current SQL statement, not
          to the end of the entire transaction. Note that other clients
          cannot insert into the table while the
          AUTO-INC table lock is held; see
          Section 14.2.10.2, “InnoDB and AUTOCOMMIT”.
        
          When you restart the MySQL server, InnoDB
          may reuse an old value that was generated for an
          AUTO_INCREMENT column but never stored
          (that is, a value that was generated during an old transaction
          that was rolled back).
        
          When an AUTO_INCREMENT column runs out of
          values, InnoDB wraps a
          BIGINT to
          -9223372036854775808 and BIGINT
          UNSIGNED to 1. However,
          BIGINT values have 64 bits, so do note that
          if you were to insert one million rows per second, it would
          still take nearly three hundred thousand years before
          BIGINT reached its upper bound. With all
          other integer type columns, a duplicate-key error results.
          This is similar to how MyISAM works,
          because it is mostly general MySQL behavior and not about any
          storage engine in particular.
        
          DELETE FROM
           does not
          regenerate the table but instead deletes all rows, one by one.
        tbl_name
          Under some conditions, TRUNCATE
           for an
          tbl_nameInnoDB table is mapped to DELETE
          FROM  and doesn't
          reset the tbl_nameAUTO_INCREMENT counter. See
          Section 13.2.9, “TRUNCATE Syntax”.
        
          In MySQL 5.0, the MySQL LOCK
          TABLES operation acquires two locks on each table if
          innodb_table_locks=1 (the default). In
          addition to a table lock on the MySQL layer, it also acquires
          an InnoDB table lock. Older versions of
          MySQL did not acquire InnoDB table locks;
          the old behavior can be selected by setting
          innodb_table_locks=0. If no
          InnoDB table lock is acquired,
          LOCK TABLES completes even if some records
          of the tables are being locked by other transactions.
        
          All InnoDB locks held by a transaction are
          released when the transaction is committed or aborted. Thus,
          it does not make much sense to invoke LOCK
          TABLES on InnoDB tables in
          AUTOCOMMIT=1 mode, because the acquired
          InnoDB table locks would be released
          immediately.
        
          Sometimes it would be useful to lock further tables in the
          course of a transaction. Unfortunately, LOCK
          TABLES in MySQL performs an implicit
          COMMIT and UNLOCK
          TABLES. An InnoDB variant of
          LOCK TABLES has been planned that can be
          executed in the middle of a transaction.
        
          The LOAD TABLE FROM MASTER statement for
          setting up replication slave servers does not work for
          InnoDB tables. A workaround is to alter the
          table to MyISAM on the master, do then the
          load, and after that alter the master table back to
          InnoDB. Do not do this if the tables use
          InnoDB-specific features such as foreign
          keys.
        
          The default database page size in InnoDB is
          16KB. By recompiling the code, you can set it to values
          ranging from 8KB to 64KB. You must update the values of
          UNIV_PAGE_SIZE and
          UNIV_PAGE_SIZE_SHIFT in the
          univ.i source file.
        
Currently, triggers are not activated by cascaded foreign key actions.
          As of MySQL 5.0.19, InnoDB does not ignore
          trailing spaces when comparing BINARY or
          VARBINARY column values. See
          Section 11.4.2, “The BINARY and VARBINARY Types” and
          Section D.1.2, “Changes in release 5.0.19 (04 March 2006)”.
        
      The following general guidelines apply to troubleshooting
      InnoDB problems:
    
          When an operation fails or you suspect a bug, you should look
          at the MySQL server error log, which is the file in the data
          directory that has a suffix of .err.
        
          When troubleshooting, it is usually best to run the MySQL
          server from the command prompt, rather than through the
          mysqld_safe wrapper or as a Windows
          service. You can then see what mysqld
          prints to the console, and so have a better grasp of what is
          going on. On Windows, you must start the server with the
          --console option to direct the output to
          the console window.
        
          Use the InnoDB Monitors to obtain
          information about a problem (see
          Section 14.2.11.1, “SHOW ENGINE INNODB STATUS and the InnoDB Monitors”). If the problem is
          performance-related, or your server appears to be hung, you
          should use innodb_monitor to print
          information about the internal state of
          InnoDB. If the problem is with locks, use
          innodb_lock_monitor. If the problem is in
          creation of tables or other data dictionary operations, use
          innodb_table_monitor to print the contents
          of the InnoDB internal data dictionary.
        
          If you suspect that a table is corrupt, run CHECK
          TABLE on that table.
        
        A specific issue with tables is that the MySQL server keeps data
        dictionary information in .frm files it
        stores in the database directories, whereas
        InnoDB also stores the information into its
        own data dictionary inside the tablespace files. If you move
        .frm files around, or if the server crashes
        in the middle of a data dictionary operation, the locations of
        the .frm files may end up out of synchrony
        with the locations recorded in the InnoDB
        internal data dictionary.
      
        A symptom of an out-of-sync data dictionary is that a
        CREATE TABLE statement fails. If this occurs,
        you should look in the server's error log. If the log says that
        the table already exists inside the InnoDB
        internal data dictionary, you have an orphaned table inside the
        InnoDB tablespace files that has no
        corresponding .frm file. The error message
        looks like this:
      
InnoDB: Error: table test/parent already exists in InnoDB internal InnoDB: data dictionary. Have you deleted the .frm file InnoDB: and not used DROP TABLE? Have you used DROP DATABASE InnoDB: for InnoDB tables in MySQL version <= 3.23.43? InnoDB: See the Restrictions section of the InnoDB manual. InnoDB: You can drop the orphaned table inside InnoDB by InnoDB: creating an InnoDB table with the same name in another InnoDB: database and moving the .frm file to the current database. InnoDB: Then MySQL thinks the table exists, and DROP TABLE will InnoDB: succeed.
        You can drop the orphaned table by following the instructions
        given in the error message. If you are still unable to use
        DROP TABLE successfully, the problem may be
        due to name completion in the mysql client.
        To work around this problem, start the mysql
        client with the --skip-auto-rehash option and
        try DROP TABLE again. (With name completion
        on, mysql tries to construct a list of table
        names, which fails when a problem such as just described
        exists.)
      
        Another symptom of an out-of-sync data dictionary is that MySQL
        prints an error that it cannot open a
        .InnoDB file:
      
ERROR 1016: Can't open file: 'child2.InnoDB'. (errno: 1)
In the error log you can find a message like this:
InnoDB: Cannot find table test/child2 from the internal data dictionary InnoDB: of InnoDB though the .frm file for the table exists. Maybe you InnoDB: have deleted and recreated InnoDB data files but have forgotten InnoDB: to delete the corresponding .frm files of InnoDB tables?
        This means that there is an orphaned .frm
        file without a corresponding table inside
        InnoDB. You can drop the orphaned
        .frm file by deleting it manually.
      
        If MySQL crashes in the middle of an ALTER
        TABLE operation, you may end up with an orphaned
        temporary table inside the InnoDB tablespace.
        Using innodb_table_monitor you can see listed
        a table whose name is #sql-.... You can
        perform SQL statements on tables whose name contains the
        character ‘#’ if you enclose the
        name within backticks. Thus, you can drop such an orphaned table
        like any other orphaned table using the method described
        earlier. Note that to copy or rename a file in the Unix shell,
        you need to put the file name in double quotes if the file name
        contains ‘#’.
      
      The MERGE storage engine, also known as the
      MRG_MyISAM engine, is a collection of identical
      MyISAM tables that can be used as one.
      “Identical” means that all tables have identical
      column and index information. You cannot merge
      MyISAM tables in which the columns are listed
      in a different order, do not have exactly the same columns, or
      have the indexes in different order. However, any or all of the
      MyISAM tables can be compressed with
      myisampack. See Section 8.4, “myisampack — Generate Compressed, Read-Only MyISAM Tables”.
      Differences in table options such as
      AVG_ROW_LENGTH, MAX_ROWS, or
      PACK_KEYS do not matter.
    
      When you create a MERGE table, MySQL creates
      two files on disk. The files have names that begin with the table
      name and have an extension to indicate the file type. An
      .frm file stores the table format, and an
      .MRG file contains the names of the tables
      that should be used as one. The tables do not have to be in the
      same database as the MERGE table itself.
    
      You can use SELECT, DELETE,
      UPDATE, and INSERT on
      MERGE tables. You must have
      SELECT, UPDATE, and
      DELETE privileges on the
      MyISAM tables that you map to a
      MERGE table.
    
      If you DROP the MERGE table,
      you are dropping only the MERGE specification.
      The underlying tables are not affected.
    
      To create a MERGE table, you must specify a
      UNION=(
      clause that indicates which list-of-tables)MyISAM tables you
      want to use as one. You can optionally specify an
      INSERT_METHOD option if you want inserts for
      the MERGE table to take place in the first or
      last table of the UNION list. Use a value of
      FIRST or LAST to cause
      inserts to be made in the first or last table, respectively. If
      you do not specify an INSERT_METHOD option or
      if you specify it with a value of NO, attempts
      to insert rows into the MERGE table result in
      an error.
    
      The following example shows how to create a
      MERGE table:
    
mysql>CREATE TABLE t1 (->a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,->message CHAR(20)) ENGINE=MyISAM;mysql>CREATE TABLE t2 (->a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,->message CHAR(20)) ENGINE=MyISAM;mysql>INSERT INTO t1 (message) VALUES ('Testing'),('table'),('t1');mysql>INSERT INTO t2 (message) VALUES ('Testing'),('table'),('t2');mysql>CREATE TABLE total (->a INT NOT NULL AUTO_INCREMENT,->message CHAR(20), INDEX(a))->ENGINE=MERGE UNION=(t1,t2) INSERT_METHOD=LAST;
      The older term TYPE is supported as a synonym
      for ENGINE for backward compatibility, but
      ENGINE is the preferred term and
      TYPE is deprecated.
    
      Note that the a column is indexed as a
      PRIMARY KEY in the underlying
      MyISAM tables, but not in the
      MERGE table. There it is indexed but not as a
      PRIMARY KEY because a MERGE
      table cannot enforce uniqueness over the set of underlying tables.
    
      After creating the MERGE table, you can issue
      queries that operate on the group of tables as a whole:
    
mysql> SELECT * FROM total;
+---+---------+
| a | message |
+---+---------+
| 1 | Testing |
| 2 | table   |
| 3 | t1      |
| 1 | Testing |
| 2 | table   |
| 3 | t2      |
+---+---------+
      Note that you can also manipulate the .MRG
      file directly from outside of the MySQL server:
    
shell>cd /shell>mysql-data-directory/current-databasels -1 t1 t2 > total.MRGshell>mysqladmin flush-tables
      To remap a MERGE table to a different
      collection of MyISAM tables, you can use one of
      the following methods:
    
          DROP the MERGE table and
          re-create it.
        
          Use ALTER TABLE  to change the list of underlying tables.
        tbl_name
          UNION=(...)
          Change the .MRG file and issue a
          FLUSH TABLE statement for the
          MERGE table and all underlying tables to
          force the storage engine to read the new definition file.
        
      MERGE tables can help you solve the following
      problems:
    
          Easily manage a set of log tables. For example, you can put
          data from different months into separate tables, compress some
          of them with myisampack, and then create a
          MERGE table to use them as one.
        
          Obtain more speed. You can split a big read-only table based
          on some criteria, and then put individual tables on different
          disks. A MERGE table on this could be much
          faster than using the big table.
        
          Perform more efficient searches. If you know exactly what you
          are looking for, you can search in just one of the split
          tables for some queries and use a MERGE
          table for others. You can even have many different
          MERGE tables that use overlapping sets of
          tables.
        
          Perform more efficient repairs. It is easier to repair
          individual tables that are mapped to a
          MERGE table than to repair a single large
          table.
        
          Instantly map many tables as one. A MERGE
          table need not maintain an index of its own because it uses
          the indexes of the individual tables. As a result,
          MERGE table collections are
          very fast to create or remap. (Note that
          you must still specify the index definitions when you create a
          MERGE table, even though no indexes are
          created.)
        
          If you have a set of tables from which you create a large
          table on demand, you should instead create a
          MERGE table on them on demand. This is much
          faster and saves a lot of disk space.
        
          Exceed the file size limit for the operating system. Each
          MyISAM table is bound by this limit, but a
          collection of MyISAM tables is not.
        
          You can create an alias or synonym for a
          MyISAM table by defining a
          MERGE table that maps to that single table.
          There should be no really notable performance impact from
          doing this (only a couple of indirect calls and
          memcpy() calls for each read).
        
      The disadvantages of MERGE tables are:
    
          You can use only identical MyISAM tables
          for a MERGE table.
        
          You cannot use a number of MyISAM features
          in MERGE tables. For example, you cannot
          create FULLTEXT indexes on
          MERGE tables. (You can, of course, create
          FULLTEXT indexes on the underlying
          MyISAM tables, but you cannot search the
          MERGE table with a full-text search.)
        
          If the MERGE table is non-temporary, all
          underlying MyISAM tables must be
          non-temporary, too. If the MERGE table is
          temporary, the MyISAM tables can be any mix
          of temporary and non-temporary.
        
          MERGE tables use more file descriptors. If
          10 clients are using a MERGE table that
          maps to 10 tables, the server uses (10 × 10) + 10 file
          descriptors. (10 data file descriptors for each of the 10
          clients, and 10 index file descriptors shared among the
          clients.)
        
          Key reads are slower. When you read a key, the
          MERGE storage engine needs to issue a read
          on all underlying tables to check which one most closely
          matches the given key. To read the next key, the
          MERGE storage engine needs to search the
          read buffers to find the next key. Only when one key buffer is
          used up does the storage engine need to read the next key
          block. This makes MERGE keys much slower on
          eq_ref searches, but not much slower on
          ref searches. See
          Section 7.2.1, “Optimizing Queries with EXPLAIN”, for more information about
          eq_ref and ref.
        
Additional resources
          A forum dedicated to the MERGE storage
          engine is available at
          http://forums.mysql.com/list.php?93.
        
        The following are known problems with MERGE
        tables:
      
            If you use ALTER TABLE to change a
            MERGE table to another storage engine,
            the mapping to the underlying tables is lost. Instead, the
            rows from the underlying MyISAM tables
            are copied into the altered table, which then uses the
            specified storage engine.
          
            REPLACE does not work.
          
            You cannot use DROP TABLE, ALTER
            TABLE, DELETE without a
            WHERE clause, REPAIR
            TABLE, TRUNCATE TABLE,
            OPTIMIZE TABLE, or ANALYZE
            TABLE on any of the tables that are mapped into an
            open MERGE table. If you do so, the
            MERGE table may still refer to the
            original table, which yields unexpected results. The easiest
            way to work around this deficiency is to ensure that no
            MERGE tables remain open by issuing a
            FLUSH TABLES statement prior to
            performing any of those operations.
          
            DROP TABLE on a table that is in use by a
            MERGE table does not work on Windows
            because the MERGE storage engine's table
            mapping is hidden from the upper layer of MySQL. Windows
            does not allow open files to be deleted, so you first must
            flush all MERGE tables (with
            FLUSH TABLES) or drop the
            MERGE table before dropping the table.
          
            A MERGE table cannot maintain uniqueness
            constraints over the entire table. When you perform an
            INSERT, the data goes into the first or
            last MyISAM table (depending on the value
            of the INSERT_METHOD option). MySQL
            ensures that unique key values remain unique within that
            MyISAM table, but not across all the
            tables in the collection.
          
            When you create a MERGE table, there is
            no check to ensure that the underlying tables exist and have
            identical structures. When the MERGE
            table is used, MySQL checks that the row length for all
            mapped tables is equal, but this is not foolproof. If you
            create a MERGE table from dissimilar
            MyISAM tables, you are very likely to run
            into strange problems.
          
            The order of indexes in the MERGE table
            and its underlying tables should be the same. If you use
            ALTER TABLE to add a
            UNIQUE index to a table used in a
            MERGE table, and then use ALTER
            TABLE to add a non-unique index on the
            MERGE table, the index ordering is
            different for the tables if there was already a non-unique
            index in the underlying table. (This happens because
            ALTER TABLE puts
            UNIQUE indexes before non-unique indexes
            to facilitate rapid detection of duplicate keys.)
            Consequently, queries on tables with such indexes may return
            unexpected results.
          
      The MEMORY storage engine creates tables with
      contents that are stored in memory. Formerly, these were known as
      HEAP tables. MEMORY is the
      preferred term, although HEAP remains supported
      for backward compatibility.
    
      Each MEMORY table is associated with one disk
      file. The filename begins with the table name and has an extension
      of .frm to indicate that it stores the table
      definition.
    
      To specify explicitly that you want to create a
      MEMORY table, indicate that with an
      ENGINE table option:
    
CREATE TABLE t (i INT) ENGINE = MEMORY;
      The older term TYPE is supported as a synonym
      for ENGINE for backward compatibility, but
      ENGINE is the preferred term and
      TYPE is deprecated.
    
      As indicated by the name, MEMORY tables are
      stored in memory. They use hash indexes by default, which makes
      them very fast, and very useful for creating temporary tables.
      However, when the server shuts down, all rows stored in
      MEMORY tables are lost. The tables themselves
      continue to exist because their definitions are stored in
      .frm files on disk, but they are empty when
      the server restarts.
    
      This example shows how you might create, use, and remove a
      MEMORY table:
    
mysql>CREATE TABLE test ENGINE=MEMORY->SELECT ip,SUM(downloads) AS down->FROM log_table GROUP BY ip;mysql>SELECT COUNT(ip),AVG(down) FROM test;mysql>DROP TABLE test;
      MEMORY tables have the following
      characteristics:
    
          Space for MEMORY tables is allocated in
          small blocks. Tables use 100% dynamic hashing for inserts. No
          overflow area or extra key space is needed. No extra space is
          needed for free lists. Deleted rows are put in a linked list
          and are reused when you insert new data into the table.
          MEMORY tables also have none of the
          problems commonly associated with deletes plus inserts in
          hashed tables.
        
          MEMORY tables can have up to 32 indexes per
          table, 16 columns per index and a maximum key length of 500
          bytes.
        
          The MEMORY storage engine implements both
          HASH and BTREE indexes.
          You can specify one or the other for a given index by adding a
          USING clause as shown here:
        
CREATE TABLE lookup
    (id INT, INDEX USING HASH (id))
    ENGINE = MEMORY;
CREATE TABLE lookup
    (id INT, INDEX USING BTREE (id))
    ENGINE = MEMORY;
General characteristics of B-tree and hash indexes are described in Section 7.4.5, “How MySQL Uses Indexes”.
          You can have non-unique keys in a MEMORY
          table. (This is an uncommon feature for implementations of
          hash indexes.)
        
          If you have a hash index on a MEMORY table
          that has a high degree of key duplication (many index entries
          containing the same value), updates to the table that affect
          key values and all deletes are significantly slower. The
          degree of this slowdown is proportional to the degree of
          duplication (or, inversely proportional to the index
          cardinality). You can use a BTREE index to
          avoid this problem.
        
          Columns that are indexed can contain NULL
          values.
        
          MEMORY tables use a fixed-length row
          storage format.
        
          MEMORY tables cannot contain
          BLOB or TEXT columns.
        
          MEMORY includes support for
          AUTO_INCREMENT columns.
        
          You can use INSERT DELAYED with
          MEMORY tables. See
          Section 13.2.4.2, “INSERT DELAYED Syntax”.
        
          MEMORY tables are shared among all clients
          (just like any other non-TEMPORARY table).
        
          MEMORY table contents are stored in memory,
          which is a property that MEMORY tables
          share with internal tables that the server creates on the fly
          while processing queries. However, the two types of tables
          differ in that MEMORY tables are not
          subject to storage conversion, whereas internal tables are:
        
              If an internal table becomes too large, the server
              automatically converts it to an on-disk table. The size
              limit is determined by the value of the
              tmp_table_size system variable.
            
              MEMORY tables are never converted to
              disk tables. To ensure that you don't accidentally do
              anything foolish, you can set the
              max_heap_table_size system variable to
              impose a maximum size on MEMORY tables.
              For individual tables, you can also specify a
              MAX_ROWS table option in the
              CREATE TABLE statement.
            
          The server needs sufficient memory to maintain all
          MEMORY tables that are in use at the same
          time.
        
          To free memory used by a MEMORY table when
          you no longer require its contents, you should execute
          DELETE or TRUNCATE
          TABLE, or remove the table altogether using
          DROP TABLE.
        
          If you want to populate a MEMORY table when
          the MySQL server starts, you can use the
          --init-file option. For example, you can put
          statements such as INSERT INTO ... SELECT
          or LOAD DATA INFILE into this file to load
          the table from a persistent data source. See
          Section 5.2.1, “mysqld Command Options”, and
          Section 13.2.5, “LOAD DATA INFILE Syntax”.
        
          If you are using replication, the master server's
          MEMORY tables become empty when it is shut
          down and restarted. However, a slave is not aware that these
          tables have become empty, so it returns out-of-date content if
          you select data from them. When a MEMORY
          table is used on the master for the first time since the
          master was started, a DELETE statement is
          written to the master's binary log automatically, thus
          synchronizing the slave to the master again. Note that even
          with this strategy, the slave still has outdated data in the
          table during the interval between the master's restart and its
          first use of the table. However, if you use the
          --init-file option to populate the
          MEMORY table on the master at startup, it
          ensures that this time interval is zero.
        
          The memory needed for one row in a MEMORY
          table is calculated using the following expression:
        
SUM_OVER_ALL_BTREE_KEYS(max_length_of_key+ sizeof(char*) × 4) + SUM_OVER_ALL_HASH_KEYS(sizeof(char*) × 2) + ALIGN(length_of_row+1, sizeof(char*))
          ALIGN() represents a round-up factor to
          cause the row length to be an exact multiple of the
          char pointer size.
          sizeof(char*) is 4 on 32-bit machines and 8
          on 64-bit machines.
        
Additional resources
          A forum dedicated to the MEMORY storage
          engine is available at
          http://forums.mysql.com/list.php?92.
        
      Sleepycat Software has provided MySQL with the Berkeley DB
      transactional storage engine. This storage engine typically is
      called BDB for short. BDB
      tables may have a greater chance of surviving crashes and are also
      capable of COMMIT and
      ROLLBACK operations on transactions.
    
      Support for the BDB storage engine is included
      in MySQL source distributions is activated in MySQL-Max binary
      distributions. The MySQL source distribution comes with a
      BDB distribution that is patched to make it
      work with MySQL. You cannot use a non-patched version of
      BDB with MySQL.
    
We at MySQL AB work in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high. (Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered gamma quality. We continue to improve and optimize it.)
      When it comes to support for any problems involving
      BDB tables, we are committed to helping our
      users locate the problem and create reproducible test cases. Any
      such test case is forwarded to Sleepycat, which in turn helps us
      find and fix the problem. As this is a two-stage operation, any
      problems with BDB tables may take a little
      longer for us to fix than for other storage engines. However, we
      anticipate no significant difficulties with this procedure because
      the Berkeley DB code itself is used in many applications other
      than MySQL.
    
For general information about Berkeley DB, please visit the Sleepycat Web site, http://www.sleepycat.com/.
        Currently, we know that the BDB storage
        engine works with the following operating systems:
      
Linux 2.x Intel
Sun Solaris (SPARC and x86)
FreeBSD 4.x/5.x (x86, sparc64)
IBM AIX 4.3.x
SCO OpenServer
SCO UnixWare 7.1.x
Windows NT/2000/XP
        The BDB storage engine does
        not work with the following operating
        systems:
      
Linux 2.x Alpha
Linux 2.x AMD64
Linux 2.x IA-64
Linux 2.x s390
Mac OS X
Note: The preceding lists are not complete. We update them as we receive more information.
        If you build MySQL from source with support for
        BDB tables, but the following error occurs
        when you start mysqld, it means that the
        BDB storage engine is not supported for your
        architecture:
      
bdb: architecture lacks fast mutexes: applications cannot be threaded Can't init databases
        In this case, you must rebuild MySQL without
        BDB support or start the server with the
        --skip-bdb option.
      
        If you have downloaded a binary version of MySQL that includes
        support for Berkeley DB, simply follow the usual binary
        distribution installation instructions. (MySQL-Max distributions
        include BDB support.)
      
        If you build MySQL from source, you can enable
        BDB support by invoking
        configure with the
        --with-berkeley-db option in addition to any
        other options that you normally use. Download a MySQL
        5.0 distribution, change location into its
        top-level directory, and run this command:
      
shell> ./configure --with-berkeley-db [other-options]
For more information, see Section 5.3, “The mysqld-max Extended MySQL Server”, Section 2.7, “Installing MySQL on Other Unix-Like Systems”, and Section 2.8, “MySQL Installation Using a Source Distribution”.
        The following options to mysqld can be used
        to change the behavior of the BDB storage
        engine. For more information, see
        Section 5.2.1, “mysqld Command Options”.
      
            The base directory for BDB tables. This
            should be the same directory that you use for
            --datadir.
          
            The BDB lock detection method. The option
            value should be DEFAULT,
            OLDEST, RANDOM, or
            YOUNGEST.
          
            The BDB log file directory.
          
Do not start Berkeley DB in recover mode.
            Don't synchronously flush the BDB logs.
            This option is deprecated; use
            --skip-sync-bdb-logs instead (see the
            description for --sync-bdb-logs).
          
            Start Berkeley DB in multi-process mode. (Do not use
            DB_PRIVATE when initializing Berkeley
            DB.)
          
            The BDB temporary file directory.
          
            Disable the BDB storage engine.
          
            Synchronously flush the BDB logs. This
            option is enabled by default. Use
            --skip-sync-bdb-logs to disable it.
          
        If you use the --skip-bdb option, MySQL does
        not initialize the Berkeley DB library and this saves a lot of
        memory. However, if you use this option, you cannot use
        BDB tables. If you try to create a
        BDB table, MySQL uses the default storage
        engine instead.
      
        Normally, you should start mysqld without the
        --bdb-no-recover option if you intend to use
        BDB tables. However, this may cause problems
        when you try to start mysqld if the
        BDB log files are corrupted. See
        Section 2.9.2.3, “Starting and Troubleshooting the MySQL Server”.
      
        With the bdb_max_lock variable, you can
        specify the maximum number of locks that can be active on a
        BDB table. The default is 10,000. You should
        increase this if errors such as the following occur when you
        perform long transactions or when mysqld has
        to examine many rows to execute a query:
      
bdb: Lock table is out of available locks Got error 12 from ...
        You may also want to change the
        binlog_cache_size and
        max_binlog_cache_size variables if you are
        using large multiple-statement transactions. See
        Section 5.12.3, “The Binary Log”.
      
See also Section 5.2.2, “Server System Variables”.
        Each BDB table is stored on disk in two
        files. The files have names that begin with the table name and
        have an extension to indicate the file type. An
        .frm file stores the table format, and a
        .db file contains the table data and
        indexes.
      
        To specify explicitly that you want a BDB
        table, indicate that with an ENGINE table
        option:
      
CREATE TABLE t (i INT) ENGINE = BDB;
        The older term TYPE is supported as a synonym
        for ENGINE for backward compatibility, but
        ENGINE is the preferred term and
        TYPE is deprecated.
      
        BerkeleyDB is a synonym for
        BDB in the ENGINE table
        option.
      
        The BDB storage engine provides transactional
        tables. The way you use these tables depends on the autocommit
        mode:
      
            If you are running with autocommit enabled (which is the
            default), changes to BDB tables are
            committed immediately and cannot be rolled back.
          
            If you are running with autocommit disabled, changes do not
            become permanent until you execute a
            COMMIT statement. Instead of committing,
            you can execute ROLLBACK to forget the
            changes.
          
            You can start a transaction with the START
            TRANSACTION or BEGIN statement
            to suspend autocommit, or with SET
            AUTOCOMMIT=0 to disable autocommit explicitly.
          
        For more information about transactions, see
        Section 13.4.1, “START TRANSACTION, COMMIT, and ROLLBACK Syntax”.
      
        The BDB storage engine has the following
        characteristics:
      
            BDB tables can have up to 31 indexes per
            table, 16 columns per index, and a maximum key size of 1024
            bytes.
          
            MySQL requires a primary key in each BDB
            table so that each row can be uniquely identified. If you
            don't create one explicitly by declaring a PRIMARY
            KEY, MySQL creates and maintains a hidden primary
            key for you. The hidden key has a length of five bytes and
            is incremented for each insert attempt. This key does not
            appear in the output of SHOW CREATE TABLE
            or DESCRIBE.
          
The primary key is faster than any other index, because it is stored together with the row data. The other indexes are stored as the key data plus the primary key, so it's important to keep the primary key as short as possible to save disk space and get better speed.
            This behavior is similar to that of
            InnoDB, where shorter primary keys save
            space not only in the primary index but in secondary indexes
            as well.
          
            If all columns that you access in a BDB
            table are part of the same index or part of the primary key,
            MySQL can execute the query without having to access the
            actual row. In a MyISAM table, this can
            be done only if the columns are part of the same index.
          
            Sequential scanning is slower for BDB
            tables than for MyISAM tables because the
            data in BDB tables is stored in B-trees
            and not in a separate data file.
          
            Key values are not prefix- or suffix-compressed like key
            values in MyISAM tables. In other words,
            key information takes a little more space in
            BDB tables compared to
            MyISAM tables.
          
            There are often holes in the BDB table to
            allow you to insert new rows in the middle of the index
            tree. This makes BDB tables somewhat
            larger than MyISAM tables.
          
            SELECT COUNT(*) FROM
             is slow for
            tbl_nameBDB tables, because no row count is
            maintained in the table.
          
            The optimizer needs to know the approximate number of rows
            in the table. MySQL solves this by counting inserts and
            maintaining this in a separate segment in each
            BDB table. If you don't issue a lot of
            DELETE or ROLLBACK
            statements, this number should be accurate enough for the
            MySQL optimizer. However, MySQL stores the number only on
            close, so it may be incorrect if the server terminates
            unexpectedly. It should not be fatal even if this number is
            not 100% correct. You can update the row count by using
            ANALYZE TABLE or OPTIMIZE
            TABLE. See Section 13.5.2.1, “ANALYZE TABLE Syntax”, and
            Section 13.5.2.5, “OPTIMIZE TABLE Syntax”.
          
            Internal locking in BDB tables is done at
            the page level.
          
            LOCK TABLES works on
            BDB tables as with other tables. If you
            do not use LOCK TABLES, MySQL issues an
            internal multiple-write lock on the table (a lock that does
            not block other writers) to ensure that the table is
            properly locked if another thread issues a table lock.
          
            To support transaction rollback, the BDB
            storage engine maintains log files. For maximum performance,
            you can use the --bdb-logdir option to
            place the BDB logs on a different disk
            than the one where your databases are located.
          
            MySQL performs a checkpoint each time a new
            BDB log file is started, and removes any
            BDB log files that are not needed for
            current transactions. You can also use FLUSH
            LOGS at any time to checkpoint the Berkeley DB
            tables.
          
For disaster recovery, you should use table backups plus MySQL's binary log. See Section 5.10.1, “Database Backups”.
            Warning: If you delete old
            log files that are still in use, BDB is
            not able to do recovery at all and you may lose data if
            something goes wrong.
          
            Applications must always be prepared to handle cases where
            any change of a BDB table may cause an
            automatic rollback and any read may fail with a deadlock
            error.
          
            If you get a full disk with a BDB table,
            you get an error (probably error 28) and the transaction
            should roll back. This contrasts with
            MyISAM tables, for which
            mysqld waits for sufficient free disk
            space before continuing.
          
            Opening many BDB tables at the same time
            may be quite slow. If you are going to use
            BDB tables, you should not have a very
            large table cache (for example, with a size larger than 256)
            and you should use the --no-auto-rehash
            option when you use the mysql client.
          
            SHOW TABLE STATUS does not provide some
            information for BDB tables:
          
mysql> SHOW TABLE STATUS LIKE 'bdbtest'\G
*************************** 1. row ***************************
           Name: bdbtest
         Engine: BerkeleyDB
        Version: 10
     Row_format: Dynamic
           Rows: 154
 Avg_row_length: 0
    Data_length: 0
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: NULL
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
Optimize performance.
Change to use no page locks for table scanning operations.
        The following list indicates restrictions that you must observe
        when using BDB tables:
      
            Each BDB table stores in its
            .db file the path to the file as it was
            created. This is done to enable detection of locks in a
            multi-user environment that supports symlinks. As a
            consequence of this, it is not possible to move
            BDB table files from one database
            directory to another.
          
            When making backups of BDB tables, you
            must either use mysqldump or else make a
            backup that includes the files for each
            BDB table (the .frm
            and .db files) as well as the
            BDB log files. The BDB
            storage engine stores unfinished transactions in its log
            files and requires them to be present when
            mysqld starts. The BDB
            logs are the files in the data directory with names of the
            form
            log.
            (ten digits).
          NNNNNNNNNN
            If a column that allows NULL values has a
            unique index, only a single NULL value is
            allowed. This differs from other storage engines, which
            allow multiple NULL values in unique
            indexes.
          
            If the following error occurs when you start
            mysqld after upgrading, it means that the
            current version of BDB doesn't support
            the old log file format:
          
bdb:  Ignoring log file: .../log.NNNNNNNNNN:
unsupported log version #
            In this case, you must delete all BDB
            logs from your data directory (the files that have names of
            the form
            log.)
            and restart mysqld. We also recommend
            that you then use mysqldump --opt to dump
            your NNNNNNNNNNBDB tables, drop the tables, and
            restore them from the dump file.
          
            If autocommit mode is disabled and you drop a
            BDB table that is referenced in another
            transaction, you may get error messages of the following
            form in your MySQL error log:
          
001119 23:43:56  bdb:  Missing log fileid entry
001119 23:43:56  bdb:  txn_abort: Log undo failed for LSN:
                       1 3644744: Invalid
            This is not fatal, but the fix is not trivial. Until the
            problem is fixed, we recommend that you not drop
            BDB tables except while autocommit mode
            is enabled.
          
      The EXAMPLE storage engine is a stub engine
      that does nothing. Its purpose is to serve as an example in the
      MySQL source code that illustrates how to begin writing new
      storage engines. As such, it is primarily of interest to
      developers.
    
      The EXAMPLE storage engine is included in
      MySQL-Max binary distributions. To enable this storage engine if
      you build MySQL from source, invoke configure
      with the --with-example-storage-engine option.
    
      To examine the source for the EXAMPLE engine,
      look in the sql/examples directory of a MySQL
      source distribution.
    
      When you create an EXAMPLE table, the server
      creates a table format file in the database directory. The file
      begins with the table name and has an .frm
      extension. No other files are created. No data can be stored into
      the table. Retrievals return an empty result.
    
mysql>CREATE TABLE test (i INT) ENGINE = EXAMPLE;Query OK, 0 rows affected (0.78 sec) mysql>INSERT INTO test VALUES(1),(2),(3);ERROR 1031 (HY000): Table storage engine for 'test' doesn't have this option mysql>SELECT * FROM test;Empty set (0.31 sec)
      The EXAMPLE storage engine does not support
      indexing.
    
      The FEDERATED storage engine is available
      beginning with MySQL 5.0.3. It is a storage engine that accesses
      data in tables of remote databases rather than in local tables.
    
      The FEDERATED storage engine is included in
      MySQL-Max binary distributions. To enable this storage engine if
      you build MySQL from source, invoke configure
      with the --with-federated-storage-engine option.
    
      To examine the source for the FEDERATED engine,
      look in the sql directory of a source
      distribution for MySQL 5.0.3 or newer.
    
Additional resources
          A forum dedicated to the FEDERATED storage
          engine is available at
          http://forums.mysql.com/list.php?105.
        
        When you create a FEDERATED table, the server
        creates a table format file in the database directory. The file
        begins with the table name and has an .frm
        extension. No other files are created, because the actual data
        is in a remote table. This differs from the way that storage
        engines for local tables work.
      
        For local database tables, data files are local. For example, if
        you create a MyISAM table named
        users, the MyISAM handler
        creates a data file named users.MYD. A
        handler for local tables reads, inserts, deletes, and updates
        data in local data files, and rows are stored in a format
        particular to the handler. To read rows, the handler must parse
        data into columns. To write rows, column values must be
        converted to the row format used by the handler and written to
        the local data file.
      
        With the MySQL FEDERATED storage engine,
        there are no local data files for a table (for example, there is
        no .MYD file). Instead, a remote database
        stores the data that normally would be in the table. The local
        server connects to a remote server, and uses the MySQL client
        API to read, delete, update, and insert data in the remote
        table. Data retrieval is initiated via a SELECT * FROM
         SQL statement. To
        read the result, rows are fetched one at a time by using the
        tbl_namemysql_fetch_row() C API function, and then
        converting the columns in the SELECT result
        set to the format that the FEDERATED handler
        expects.
      
The flow of information is as follows:
SQL calls issued locally
MySQL handler API (data in handler format)
MySQL client API (data converted to SQL calls)
Remote database -> MySQL client API
Convert result sets (if any) to handler format
Handler API -> Result rows or rows-affected count to local
        The procedure for using FEDERATED tables is
        very simple. Normally, you have two servers running, either both
        on the same host or on different hosts. (It is possible for a
        FEDERATED table to use another table that is
        managed by the same server, although there is little point in
        doing so.)
      
        First, you must have a table on the remote server that you want
        to access by using a FEDERATED table. Suppose
        that the remote table is in the federated
        database and is defined like this:
      
CREATE TABLE test_table (
    id     INT(20) NOT NULL AUTO_INCREMENT,
    name   VARCHAR(32) NOT NULL DEFAULT '',
    other  INT(20) NOT NULL DEFAULT '0',
    PRIMARY KEY  (id),
    INDEX name (name),
    INDEX other_key (other)
)
ENGINE=MyISAM
DEFAULT CHARSET=latin1;
        The example uses a MyISAM table, but the
        table could use any storage engine.
      
        Next, create a FEDERATED table on the local
        server for accessing the remote table:
      
CREATE TABLE federated_table (
    id     INT(20) NOT NULL AUTO_INCREMENT,
    name   VARCHAR(32) NOT NULL DEFAULT '',
    other  INT(20) NOT NULL DEFAULT '0',
    PRIMARY KEY  (id),
    INDEX name (name),
    INDEX other_key (other)
)
ENGINE=FEDERATED
DEFAULT CHARSET=latin1
CONNECTION='mysql://root@remote_host:9306/federated/test_table';
        (Before MySQL 5.0.13, use COMMENT rather than
        CONNECTION.)
      
        The structure of this table must be exactly the same as that of
        the remote table, except that the ENGINE
        table option should be FEDERATED and the
        CONNECTION table option is a connection
        string that indicates to the FEDERATED engine
        how to connect to the remote server.
      
        The FEDERATED engine creates only the
        test_table.frm file in the
        federated database.
      
        The remote host information indicates the remote server to which
        your local server connects, and the database and table
        information indicates which remote table to use as the data
        source. In this example, the remote server is indicated to be
        running as remote_host on port 9306, so there
        must be a MySQL server running on the remote host and listening
        to port 9306.
      
        The general form of the connection string in the
        CONNECTION option is as follows:
      
scheme://user_name[:password]@host_name[:port_num]/db_name/tbl_name
        Only mysql is supported as the
        scheme value at this point; the
        password and port number are optional.
      
Here are some example connection strings:
CONNECTION='mysql://username:password@hostname:port/database/tablename' CONNECTION='mysql://username@hostname/database/tablename' CONNECTION='mysql://username:password@hostname/database/tablename'
        The use of CONNECTION for specifying the
        connection string is non-optimal and is likely to change in
        future. Keep this in mind for applications that use
        FEDERATED tables. Such applications are
        likely to need modification if the format for specifying
        connection information changes.
      
        Because any password given in the connection string is stored as
        plain text, it can be seen by any user who can use SHOW
        CREATE TABLE or SHOW TABLE STATUS
        for the FEDERATED table, or query the
        TABLES table in the
        INFORMATION_SCHEMA database.
      
        The following items indicate features that the
        FEDERATED storage engine does and does not
        support:
      
            In the first version, the remote server must be a MySQL
            server. Support by FEDERATED for other
            database engines may be added in the future.
          
            The remote table that a FEDERATED table
            points to must exist before you try to
            access the table through the FEDERATED
            table.
          
            It is possible for one FEDERATED table to
            point to another, but you must be careful not to create a
            loop.
          
There is no support for transactions.
            There is no way for the FEDERATED engine
            to know if the remote table has changed. The reason for this
            is that this table must work like a data file that would
            never be written to by anything other than the database. The
            integrity of the data in the local table could be breached
            if there was any change to the remote database.
          
            The FEDERATED storage engine supports
            SELECT, INSERT,
            UPDATE, DELETE, and
            indexes. It does not support ALTER TABLE,
            DROP TABLE, or any other Data Definition
            Language statements. The current implementation does not use
            Prepared statements.
            
          
            The implementation uses SELECT,
            INSERT, UPDATE, and
            DELETE, but not
            HANDLER.
          
            FEDERATED tables do not work with the
            query cache.
          
        Some of these limitations may be lifted in future versions of
        the FEDERATED handler.
      
      The ARCHIVE storage engine is used for storing
      large amounts of data without indexes in a very small footprint.
    
      The ARCHIVE storage engine is included in MySQL
      binary distributions. To enable this storage engine if you build
      MySQL from source, invoke configure with the
      --with-archive-storage-engine option.
    
      To examine the source for the ARCHIVE engine,
      look in the sql directory of a MySQL source
      distribution.
    
      You can check whether the ARCHIVE storage
      engine is available with this statement:
    
mysql> SHOW VARIABLES LIKE 'have_archive';
      When you create an ARCHIVE table, the server
      creates a table format file in the database directory. The file
      begins with the table name and has an .frm
      extension. The storage engine creates other files, all having
      names beginning with the table name. The data and metadata files
      have extensions of .ARZ and
      .ARM, respectively. An
      .ARN file may appear during optimization
      operations.
    
      The ARCHIVE engine supports
      INSERT and SELECT, but not
      DELETE, REPLACE, or
      UPDATE. It does support ORDER
      BY operations, BLOB columns, and
      basically all but spatial data types (see
      Section 16.4.1, “MySQL Spatial Data Types”). The
      ARCHIVE engine uses row-level locking.
    
      Storage: Rows are compressed as
      they are inserted. The ARCHIVE engine uses
      zlib lossless data compression (see
      http://www.zlib.net/). You can use
      OPTIMIZE TABLE to analyze the table and pack it
      into a smaller format (for a reason to use OPTIMIZE
      TABLE, see later in this section). Beginning with MySQL
      5.0.15, the engine also supports CHECK TABLE.
      There are several types of insertions that are used:
    
          An INSERT statement just pushes rows into a
          compression buffer, and that buffer flushes as necessary. The
          insertion into the buffer is protected by a lock. A
          SELECT forces a flush to occur, unless the
          only insertions that have come in were INSERT
          DELAYED (those flush as necessary). See
          Section 13.2.4.2, “INSERT DELAYED Syntax”.
        
          A bulk insert is visible only after it completes, unless other
          inserts occur at the same time, in which case it can be seen
          partially. A SELECT never causes a flush of
          a bulk insert unless a normal insert occurs while it is
          loading.
        
      Retrieval: On retrieval, rows are
      uncompressed on demand; there is no row cache. A
      SELECT operation performs a complete table
      scan: When a SELECT occurs, it finds out how
      many rows are currently available and reads that number of rows.
      SELECT is performed as a consistent read. Note
      that lots of SELECT statements during insertion
      can deteriorate the compression, unless only bulk or delayed
      inserts are used. To achieve better compression, you can use
      OPTIMIZE TABLE or REPAIR
      TABLE. The number of rows in ARCHIVE
      tables reported by SHOW TABLE STATUS is always
      accurate. See Section 13.5.2.5, “OPTIMIZE TABLE Syntax”,
      Section 13.5.2.6, “REPAIR TABLE Syntax”, and
      Section 13.5.4.21, “SHOW TABLE STATUS Syntax”.
    
Additional resources
          A forum dedicated to the ARCHIVE storage
          engine is available at
          http://forums.mysql.com/list.php?112.
        
      The CSV storage engine stores data in text
      files using comma-separated values format.
    
      To enable this storage engine, use the
      --with-csv-storage-engine option to
      configure when you build MySQL.
    
      The CSV storage engine is included in MySQL-Max
      binary distributions. To enable this storage engine if you build
      MySQL from source, invoke configure with the
      --with-csv-storage-engine option.
    
      To examine the source for the CSV engine, look
      in the sql/examples directory of a MySQL
      source distribution.
    
      When you create a CSV table, the server creates
      a table format file in the database directory. The file begins
      with the table name and has an .frm
      extension. The storage engine also creates a data file. Its name
      begins with the table name and has a .CSV
      extension. The data file is a plain text file. When you store data
      into the table, the storage engine saves it into the data file in
      comma-separated values format.
    
mysql>CREATE TABLE test(i INT, c CHAR(10)) ENGINE = CSV;Query OK, 0 rows affected (0.12 sec) mysql>INSERT INTO test VALUES(1,'record one'),(2,'record two');Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql>SELECT * FROM test;+------+------------+ | i | c | +------+------------+ | 1 | record one | | 2 | record two | +------+------------+ 2 rows in set (0.00 sec)
      If you examine the test.CSV file in the
      database directory created by executing the preceding statements,
      its contents should look like this:
    
"1","record one" "2","record two"
      The CSV storage engine does not support
      indexing.
    
      The BLACKHOLE storage engine acts as a
      “black hole” that accepts data but throws it away and
      does not store it. Retrievals always return an empty result:
    
mysql>CREATE TABLE test(i INT, c CHAR(10)) ENGINE = BLACKHOLE;Query OK, 0 rows affected (0.03 sec) mysql>INSERT INTO test VALUES(1,'record one'),(2,'record two');Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql>SELECT * FROM test;Empty set (0.00 sec)
      The BLACKHOLE storage engine is included in
      MySQL-Max binary distributions. To enable this storage engine if
      you build MySQL from source, invoke configure
      with the --with-blackhole-storage-engine option.
    
      To examine the source for the BLACKHOLE engine,
      look in the sql directory of a MySQL source
      distribution.
    
      When you create a BLACKHOLE table, the server
      creates a table format file in the database directory. The file
      begins with the table name and has an .frm
      extension. There are no other files associated with the table.
    
      The BLACKHOLE storage engine supports all kinds
      of indexes. That is, you can include index declarations in the
      table definition.
    
      You can check whether the BLACKHOLE storage
      engine is available with this statement:
    
mysql> SHOW VARIABLES LIKE 'have_blackhole_engine';
      Inserts into a BLACKHOLE table do not store any
      data, but if the binary log is enabled, the SQL statements are
      logged (and replicated to slave servers). This can be useful as a
      repeater or filter mechanism. For example, suppose that your
      application requires slave-side filtering rules, but transferring
      all binary log data to the slave first results in too much
      traffic. In such a case, it is possible to set up on the master
      host a “dummy” slave process whose default storage
      engine is BLACKHOLE, depicted as follows:
    

      The master writes to its binary log. The “dummy”
      mysqld process acts as a slave, applying the
      desired combination of replicate-do-* and
      replicate-ignore-* rules, and writes a new,
      filtered binary log of its own. (See
      Section 6.8, “Replication Startup Options”.) This filtered log is
      provided to the slave.
    
The dummy process does not actually store any data, so there is little processing overhead incurred by running the additional mysqld process on the replication master host. This type of setup can be repeated with additional replication slaves.
      Other possible uses for the BLACKHOLE storage
      engine include:
    
Verification of dump file syntax.
          Measurement of the overhead from binary logging, by comparing
          performance using BLACKHOLE with and
          without binary logging enabled.
        
          BLACKHOLE is essentially a
          “no-op” storage engine, so it could be used for
          finding performance bottlenecks not related to the storage
          engine itself.