Learn Databases: August 2016

Wednesday, August 31, 2016

Calculate Log file size

- Needs log file to be big enough to let InnoDB optimize its I/O, but not so big that recovery takes a long time.
Formula-1: Using sequence number of 'show engine innodb status'
mysql> pager grep sequence
PAGER set to 'grep sequence'
mysql> show engine innodb status\G
Log sequence number 8759936991603
1 row in set (0.02 sec)
mysql> select sleep(60);
1 row in set (1 min 0.01 sec)
mysql> show engine innodb status\G
Log sequence number 8759947542478
1 row in set (0.01 sec)
mysql> exit
Bye
=> Notice the log sequence number. That’s the total number of bytes written to the transaction log. So, now you can see how many MB have been written to the log in one minute.
=> As a rough rule of thumb, you can make the log big enough that it can hold at most an hour or so of logs. That’s generally plenty of data for InnoDB to work with; an hour’s worth is more than enough so that it can reorder the writes to use sequential I/O during the flushing and checkpointing process.
mysql> select (8759947542478-8759936991603)/1024/1024 "MB per minute";
+---------------+
| MB per minute |
+---------------+
| 10.06209850 |
+---------------+
1 row in set (0.00 sec)
Thus, approx 600 MB of data been written in 1 hour. Since there are two log files by default, divide that in half, and now you can set.
innodb_log_file_size=384M
Formula-2: Using global status 'Innodb_os_log_written'
mysql> SHOW GLOBAL STATUS like 'Innodb_os_log_written';
+-----------------------+---------------+
| Variable_name | Value |
+-----------------------+---------------+
| Innodb_os_log_written | 2869002693632 |
+-----------------------+---------------+
1 row in set (0.00 sec)
mysql> select sleep(60);
+-----------+
| sleep(60) |
+-----------+
| 0 |
+-----------+
1 row in set (59.99 sec)
mysql> SHOW GLOBAL STATUS like 'Innodb_os_log_written';
+-----------------------+---------------+
| Variable_name | Value |
+-----------------------+---------------+
| Innodb_os_log_written | 2869020893696 |
+-----------------------+---------------+
1 row in set (0.01 sec)
mysql> select (2869020893696-2869002693632)/1024/1024 "MB per minute";
+---------------+
| MB per minute |
+---------------+
| 17.35693359 |
+---------------+
1 row in set (0.00 sec)
Thus, approx 1041MB of data been written in 1 hour. Since there are two log files by default, divide that in half, and now you can set.
innodb_log_file_size=512M
References:
https://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/

MySQL Isolation levels

The four isolation levels
====================

READ UNCOMMITTED
---------------------------------
With the READ-UNCOMMITTED isolation level there isn’t much isolation present between the transactions at all. The transactions can see changes to data made by other transactions that are not committed yet. That means transactions could be reading data that may not even exist eventually because the other transaction that was updating the data rolled-back the changes and didn’t commit. This is known as dirty read. An application rarely needs to rely on dirty needs, in fact this really can’t be called an isolation level. Simply put, there isn’t really any isolation at all and hence such a system can’t really be called a transactional system.

READ COMMITTED
-----------------------------
With the READ-COMMITTED isolation level, the phenomenon of dirty read is avoided, because any uncommitted changes is not visible to any other transaction, until the change is committed. Within this isolation level each SELECT uses its own snapshot of the committed data that was committed before the execution of the SELECT. Now because each SELECT has its own snapshot, so the same SELECT when run multiple times during the same transaction could return different result sets (in case some one commit from other session). This phenomenon is called non-repeatable read. Each consistent read, even within the same transaction, sets and reads its own fresh snapshot.

REPEATABLE READ
-----------------------------
With the REPEATABLE-READ isolation level, the phenomenon of non-repeatable read is avoided. This isolation level returns the same result set through out the transaction execution for the same SELECT run any number of times. A snapshot of the SELECT is taken the first time that SELECT is run during the transaction and the same snapshot is used through out the transaction when the same SELECT is executed. A transaction running in this isolation level does not take into account any changes to data made by other transactions, regardless of whether the changes have been committed or not. This ensures that reads are always consistent(repeatable). This isolation level is the default for InnoDB. Although this isolation level solves the problem of non-repeatable read, but there is another possible problem phantom reads.

SERIALIZABLE
-----------------------
With the SERIALIZABLE isolation level, the phenomenon of phantom reads is avoided. Transactions when run in this isolation level place locks on all records that are accessed, as well as locks the resource so that records cannot be appended to the table being operated on by the transaction. Transactions when run in this fashion, run in a serialized manner. This isolation level is the strongest possible isolation level.

Note:
--------
1> The READ-UNCOMMITTED isolation level has the least number of locking done, after that comes the READ-COMMITTED isolation level which removes most of the gap-locking and hence produces fewer deadlocks, also in the case of READ-COMMITTED, locking reads only lock the index records and not the gaps before/after them. REPEATABLE-READ has a higher level of locking as compared to READ-COMMITTED, UPDATE, DELETE use next-key locking, also locking reads also use next-key locking. SERIALIZABLE has the highest level of locking, all the simple SELECTs are automatically converted to SELECT … LOCK IN SHARE MODE, and hence all records have shared locks.

2> SERIALIZABLE and REPEATABLE-READ employ lots of locking and hence creating more deadlock situations, which in turn decreases performance. In fact SERIALIZABLE is the least performant of the isolation levels, as it converts even plain reads into locking reads. REPEATABLE-READ is better in terms of locking and deadlocks but READ-COMMITTED is even better because there are fewer gap-locks. But locking and deadlocks is not the only thing when considering performance, there is another issue of mutex contention that needs consideration.

References:
http://www.ovaistariq.net/597/understanding-innodb-transaction-isolation-levels/#.V8aravl94lI
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html

InnoDB Doublewrite Buffer

InnoDB Doublewrite Buffer
=====================
InnoDB uses asynchronous disk I/O where possible, by creating a number of threads to handle I/O operations, while permitting other database operations to proceed while the I/O is still in progress. On Linux and Windows platforms, InnoDB uses the available OS and library functions to perform “native” asynchronous I/O. On other platforms, InnoDB still uses I/O threads, but the threads may actually wait for I/O requests to complete; this technique is known as “simulated” asynchronous I/O.

InnoDB uses a novel file flush technique involving a structure called the doublewrite buffer, which is enabled by default (innodb_doublewrite=ON). It adds safety to recovery following a crash or power outage, and improves performance on most varieties of Unix by reducing the need for fsync() operations.

Before writing pages to a data file, InnoDB first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed does InnoDB write the pages to their proper positions in the data file. If there is an operating system, storage subsystem, or mysqld process crash in the middle of a page write (causing a torn page condition), InnoDB can later find a good copy of the page from the doublewrite buffer during recovery.

Even if the data is written twice the performance impact is usually small, but in some heavy workloads the doublewrite buffer becomes a bottleneck.
“doublewrite” - It means Innodb will write data twice when it performs table space writes – writes to log files are done only once. InnoDB and XtraDB use a special feature called the doublewrite buffer to provide a strong guarantee against data corruption.

"doublewrite" is needed to archive data safety in case of partial page writes. Innodb does not log full pages to the log files, but uses what is called “physiological” logging which means log records contain page number for the operation as well as operation data (ie update the row) and log sequence information. Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent. It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.

Partial page writes is when page write request submited to OS completes only partially. For example out of 16K Innodb page only first 4KB are updated and other parts remain in their former state. Most typically partial page writes happen when power failure happens. It also can happen on OS crash – there is a chance operation system will split your 16K write into several writes and failure happens just between their execution. Reasons for splitting could be file fragmentation – most file systems use 4K block sizes by default so 16K could use more than one fragment.

Even though double write requires each page written twice its overhead is far less than double. Write to double write buffer is sequential so it is pretty cheap. It also allows Innodb to save on fsync()s – instead of calling fsync() for each page write Innodb submits multiple page writes and calls fsync() which allows Operating System to optimize in which order writes are executed and use multiple devices in parallel. This optimization could be used without doublewrite though, it was just implemented at the same time. So in general, would expect no more than 5-10% performance loss due to use of doublewrite.

If you do not care about your data (ie slaves on RAID0) or if your file system guarantees you no partial page writes could exist you can disable doublewrite by setting innodb_doublewrite=0 It is however not worth the trouble in most cases.

References:
https://dev.mysql.com/doc/refman/5.5/en/innodb-disk-io.html
https://www.percona.com/blog/2006/08/04/innodb-double-write/
https://www.percona.com/doc/percona-server/5.5/performance/innodb_doublewrite_path.html

Monday, August 29, 2016

Tuning InnoDB Parameters

InnoDB parameters which needs tuning/attention at 1st glance:

innodb_file_per_table
=================
Variable Scope: Global
Dynamic Variable: Yes
Type: Boolean
Default: On

Definition: When innodb_file_per_table is enabled (the default in 5.6.6 and higher), InnoDB stores the data and indexes for each newly created table in a separate .ibd file, rather than in the system tablespace. The storage for these InnoDB tables is reclaimed when the tables are dropped or truncated.

Note: Be aware that enabling innodb_file_per_table also means that an ALTER TABLE operation will move InnoDB table from the system tablespace to an individual .ibd file in cases where ALTER TABLE recreates the table (ALGORITHM=COPY). An exception to this rule is for tables that were placed in the system tablespace using the TABLESPACE=innodb_system option with CREATE TABLE or ALTER TABLE. These tables are unaffected by the innodb_file_per_table setting and can only be moved to file-per-table tablespaces using ALTER TABLE ... TABLESPACE=innodb_file_per_table.

By default, all InnoDB tables and indexes are stored in the system tablespace. As an alternative, you can store each InnoDB table and its indexes in its own file. This feature is called "multiple tablespaces" because each table that is created when this setting is in effect has its own tablespace.
.frm - table format file
.ibd - contains index and data

When innodb_file_per_table is disabled, InnoDB stores the data for all tables and indexes in the ibdata files that make up the system tablespace. This setting reduces the performance overhead of filesystem operations for operations such as DROP TABLE or TRUNCATE TABLE.

Optimal Value: ON

######################################################################
######################################################################

innodb_buffer_pool_size
===================
Variable Scope: Global
Dynamic Variable(>= 5.7.5): Yes
Dynamic Variable(<= 5.7.4): No
Type: Integer
Default: 128M

The size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. The maximum value depends on the CPU architecture (2^64-1 on 64 bit architecture).

The larger you set this value, the less disk I/O is needed to access data in tables. On a dedicated database server, you may set this to up to 80% of the machine physical memory size. Be prepared to scale back this value if these other issues occur:
- Competition for physical memory might cause paging in the operating system.
- InnoDB reserves additional memory for buffers and control structures, so that the total allocated space is approximately 10% greater than the specified buffer pool size.

Note:
- When the size of the buffer pool is greater than 1GB, setting innodb_buffer_pool_instances to a value greater than 1 can improve the scalability on a busy server.
- As of MySQL 5.7.5, when you increase or decrease buffer pool size, the operation is performed in chunks. Chunk size is defined by the innodb_buffer_pool_chunk_size configuration option, which has a default of 128 MB.

Optimal Value: 70% of RAM dedicated to server

######################################################################
######################################################################

innodb_log_file_size
================
Variable Scope: Global
Dynamic Variable: No
Type: Integer
Default: 48M

The size in bytes of each log file in a log group. The combined size of log files (innodb_log_file_size * innodb_log_files_in_group) cannot exceed a maximum value that is slightly less than 512GB. A pair of 255 GB log files, for example, would allow you to approach the limit but not exceed it. The default value is 48MB. Sensible values range from 1MB to 1/N-th of the size of the buffer pool, where N is the number of log files in the group. The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. Larger log files also make crash recovery slower, although improvements to recovery performance in MySQL 5.5 and higher make the log file size less of a consideration.

Optimal Value: 128M to 256M

######################################################################
######################################################################

innodb_log_buffer_size
==================
Variable Scope: Global
Dynamic Variable: No
Type: Integer
Default(>= 5.7.6): 16M
Default(<= 5.7.5): 8M

The size in bytes of the buffer that InnoDB uses to write to the log files on disk. The default value changed from 8MB to 16MB in 5.7.6 with the introduction of 32k and 64k innodb_page_size values. A large log buffer enables large transactions to run without a need to write the log to disk before the transactions commit. Thus, if you have transactions that update, insert, or delete many rows, making the log buffer larger saves disk I/O.

Optimal Value: 32M to 64M

######################################################################
######################################################################

innodb_flush_log_at_trx_commit
=========================
Variable Scope: Global
Dynamic Variable: Yes
Type: enumeration
Default: 1

Controls the balance between strict ACID compliance for commit operations, and higher performance that is possible when commit-related I/O operations are rearranged and done in batches. You can achieve better performance by changing the default value, but then you can lose up to a second of transactions in a crash.

- The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDB log buffer are written out to the log file at each transaction commit and the log file is flushed to disk.

- With a value of 0, the contents of the InnoDB log buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash.

- With a value of 2, the contents of the InnoDB log buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage.

Note:
- DDL changes and other internal InnoDB activities flush the InnoDB log independent of the innodb_flush_log_at_trx_commit setting.
- InnoDB's crash recovery works regardless of the innodb_flush_log_at_trx_commit setting. Transactions are either applied entirely or erased entirely.

Optimal Value:
At Primary (Master) Server: innodb_flush_log_at_trx_commit=1
At Secondary (Slave) Server: innodb_flush_log_at_trx_commit=2

######################################################################
######################################################################

innodb_flush_method
=================
Variable Scope: Global
Dynamic Variable: No
Type: string
Default: NULL

Defines the method used to flush data to the InnoDB data files and log files, which can affect I/O throughput. If innodb_flush_method=NULL on a Unix-like system, the fsync option is used by default.

The innodb_flush_method options for Unix-like systems include:
- fsync: InnoDB uses the fsync() system call to flush both the data and log files. fsync is the default setting.
- O_DSYNC: InnoDB uses O_SYNC to open and flush the log files, and fsync() to flush the data files. InnoDB does not use O_DSYNC directly because there have been problems with it on many varieties of Unix.
- O_DIRECT: InnoDB uses O_DIRECT (or directio() on Solaris) to open the data files, and uses fsync() to flush both the data and log files. This option is available on some GNU/Linux versions, FreeBSD, and Solaris.
- O_DIRECT_NO_FSYNC: InnoDB uses O_DIRECT during flushing I/O, but skips the fsync() system call afterward. This setting is suitable for some types of file systems but not others.

How each settings affects performance depends on hardware configuration and workload. Benchmark your particular configuration to decide which setting to use, or whether to keep the default setting. Examine the Innodb_data_fsyncs status variable to see the overall number of fsync() calls for each setting. The mix of read and write operations in your workload can affect how a setting performs. For example, on a system with a hardware RAID controller and battery-backed write cache, O_DIRECT can help to avoid double buffering between the InnoDB buffer pool and the operating system's file system cache. On some systems where InnoDB data and log files are located on a SAN, the default value or O_DSYNC might be faster for a read-heavy workload with mostly SELECT statements. Always test this parameter with hardware and workload that reflect your production environment.

Optimal Value:
innodb_flush_method=O_DIRECT

######################################################################
######################################################################

innodb_flush_neighbors
===================
Variable Scope: Global
Dynamic Variable: Yes
Type: enumeration
Default: 1

Specifies whether flushing a page from the InnoDB buffer pool also flushes other dirty pages in the same extent.
- The default value of 1 flushes contiguous dirty pages in the same extent from the buffer pool.
- A setting of 0 turns innodb_flush_neighbors off and no other dirty pages are flushed from the buffer pool.
- A setting of 2 flushes dirty pages in the same extent from the buffer pool.

When the table data is stored on a traditional HDD storage device, flushing such neighbour pages in one operation reduces I/O overhead (primarily for disk seek operations) compared to flushing individual pages at different times. For table data stored on SSD, seek time is not a significant factor and you can turn this setting off to spread out the write operations.

Optimal:
innodb_flush_neighbors=0

######################################################################
######################################################################

innodb_read_io_threads
===================
Variable Scope: Global
Dynamic Variable: No
Type: Integer
Default: 4

The number of I/O threads for read operations in InnoDB.

innodb_write_io_threads
===================
Variable Scope: Global
Dynamic Variable: No
Type: Integer
Default: 4

The number of I/O threads for write operations in InnoDB.

Note:
- InnoDB uses background threads to service various types of I/O requests. You can configure the number of background threads that service read and write I/O on data pages, using the configuration parameters innodb_read_io_threads and innodb_write_io_threads. These parameters signify the number of background threads used for read and write requests respectively. They are effective on all supported platforms. You can set the value of these parameters in the MySQL option file (my.cnf or my.ini); you cannot change them dynamically. The default value for these parameters is 4 and the permissible values range from 1-64.

- The purpose of this change is to make InnoDB more scalable on high end systems. Each background thread can handle up to 256 pending I/O requests. A major source of background I/O is the read-ahead requests. InnoDB tries to balance the load of incoming requests in such way that most of the background threads share work equally. InnoDB also attempts to allocate read requests from the same extent to the same thread to increase the chances of coalescing the requests together.

Optimal Value:
innodb_write_io_threads=8
innodb_read_io_threads=8

######################################################################
######################################################################

References:
http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-performance-multiple_io_threads.html

Friday, August 19, 2016

Daily tasks - Replication

1> Don't replicate particular command to slave:

Log on to master:
mysql> SET SQL_LOG_BIN=0;
mysql> Any DDL or DML command
mysql> exit

Note:
1> The command issued above will not get replicated to slave. Means, it will not get captured at mysql binary log. Thus, not replicated.
2> Be cautious, as it may lead to replication break in future.

2> Ignore Replication command at slave:

Let's say, we face a situation where slave not able to execute particular command which results in replication break for that time.

Log on to slave:
mysql> STOP SLAVE;
mysql> SET GLOBAL sql_slave_skip_counter = 1;
mysql> START SLAVE;

Note: Be cautious, Don't use above command on tables which have high DML rate. As it may lead to replication break in future.

Daily Tasks - Sessions

1> To grab list of count of sessions connected to Database group by host.

mysql> select distinct substring_index(host,':',1) "host", count(user) "count" from information_schema.processlist group by substring_index(host,':',1);

Output:
+-------------------------------------------+-------+
| host                                    | count |
+-------------------------------------------+-------+
| test1                                                   |   150 |
| test2                                       |   150 |
+-------------------------------------------+-------+
2 rows in set (0.00 sec)

2> Kill connections of all users connected to DB from backend

mysql> SELECT CONCAT('KILL ',id,';') FROM information_schema.processlist WHERE user in ('test') INTO outfile '/tmp/killSessions001.txt';
mysql> source /tmp/killSessions001.txt;

Output:
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

Note:
1> Don't use it in production as it may cause some thread hang which results in high CPU and I/O and lead to DB crash. If face some situation, stop the services and restart Database immediately.
2> This is useful, when you need to apply some DDL patch at some DEV/STAG env. while application is running, and don't want to stop all services box.

Percona MySQL Vs Oracle MySQL

Percona MySQL:
1> It provides extra diagnostic features like:
   * Undo Segment Information
   * Per-table/Index/Client/Thread Performance Counters
2> It provides tokuDB supports
3> It provides threadpool with community version
4> It provides extra feautres for DBA/OP staff:
   * Configurable page sizes
   * Backup locks
   * Ability to Kill Idle Transactions
5> It provides XtraDB storage engine.

Oracle MySQL:
1> It provides support of NoSQL interface via memcached

Note: Oracle MySQL Enterprise edition do provides lot of extra features like Security, Encryption, Auditing, Scalability, Monitoring.

Reference:
https://www.percona.com/software/mysql-database/percona-server/feature-comparison
http://www.sobstel.org/blog/mysql-like-databases-comparison/

MySQL Enterprise Vs Community

Features been provided by MySQL Enterprise edition in comparison to Community Version

MySQL Enterprise:
1> MySQL Enterprise Audit :: Policy based auditing for MySQL applications
2> MySQL Enterprise Encryption :: encryption libraries, key management, sign and verify data.
3> MySQL Enterprise Security :: PAM (Pluggable Authentication Modules), Windows, Pluggable Authentication API
4> MySQL Enterprise Scalability :: MySQL Thread pool
5> MySQL Tools:
   * Oracle Enterprise Manager for MySQL
   * MySQL Enterprise Monitor
   * MySQL Workbench
6> MySQL Enterprise backup
7> Oracle premier support for MySQL

Reference:
http://www.slideshare.net/mablomy/mysql-42347173

Improvements in MySQL Versions 5.5/5.6/5.7

MySQL 5.5
---------------
1> MySQL Enterprise Thread Pool :: The default thread-handling model in MySQL Server executes statements using one thread per client connection. As more clients connect to the server and execute statements, overall performance degrades. As of MySQL 5.5.16, MySQL Enterprise Edition distributions include a thread pool plugin that provides an alternative thread-handling model designed to reduce overhead and improve performance. The plugin implements a thread pool that increases server performance by efficiently managing statement execution threads for large numbers of client connections.
2> MySQL Enterprise Audit :: MySQL Enterprise Edition now includes MySQL Enterprise Audit, implemented using a server plugin named audit_log. MySQL Enterprise Audit uses the open MySQL Audit API to enable standard, policy-based monitoring and logging of connection and query activity executed on specific MySQL servers. Designed to meet the Oracle audit specification, MySQL Enterprise Audit provides an out of box, easy to use auditing and compliance solution for applications that are governed by both internal and external regulatory guidelines. When installed, the audit plugin enables MySQL Server to produce a log file containing an audit record of server activity. The log contents include when clients connect and disconnect, and what actions they perform while connected, such as which databases and tables they access.
3> Multi-core scalability :: Scalability on multi-core CPUs is improved. The trend in hardware development now is toward more cores rather than continued increases in CPU clock speeds, which renders “wait until CPUs get faster” a nonviable means of improving database performance. Instead, it is necessary to make better use of multiple cores to maximally exploit the processing cycles they make available. The focus has been on InnoDB, especially locking and memory management.
4> Default storage engine :: The default storage engine for new tables is InnoDB rather than MyISAM.
5> Semisynchronous replication :: A commit performed on the master side blocks before returning to the session that performed the transaction until at least one slave acknowledges that it has received and logged the events for the transaction. Semisynchronous replication is implemented through an optional plugin component.
6> Partitioning :: It is now possible to delete all rows from one or more partitions of a partitioned table using the ALTER TABLE ... TRUNCATE PARTITION statement.
7> Metadata locking :: The server now prevents DDL statements from compromising transaction serializibility by using a new class of locks called metadata locks.
8> IPv6 support :: MySQL Server can accept TCP/IP connections from clients connecting over IPv6.

MySQL 5.6
---------------
1> Security improvements :: Stronger encryption for user account passwords, available through an authentication plugin named sha256_password that implements SHA-256 password hashing. The mysql.user table now has a password_expired column. Its default value is 'N', but can be set to 'Y' with the new ALTER USER statement. After an account's password has been expired, all operations performed in subsequent connections to the server using the account result in an error until the user issues a SET PASSWORD statement to establish a new account password.
2> InnoDB Enhancements (Online DDL) :: Several ALTER TABLE operations can be performed without copying the table, without blocking inserts, updates, and deletes to the table, or both. These enhancements are known collectively as online DDL.
3> InnoDB Enhancements (Transport Tablespaces) :: InnoDB now supports the notion of “transportable tablespaces”, allowing file-per-table tablespaces (.ibd files) to be exported from a running MySQL instance and imported into another running instance without inconsistencies or mismatches caused by buffered data, in-progress transaction
4> InnoDB Enhancements (Deadlocks) :: InnoDB uses a new, faster algorithm to detect deadlocks. Information about all InnoDB deadlocks can be written to the MySQL server error log, to help diagnose application issues.
5> InnoDB Enhancements (Undo Tablespace) :: You can move the InnoDB undo log out of the system tablespace into one or more separate tablespaces. The I/O patterns for the undo log make these new tablespaces good candidates to move to SSD storage, while keeping the system tablespace on hard disk storage.
6> InnoDB Enhancements (Memcache) :: You can code MySQL applications that access InnoDB tables through a NoSQL-style API. This feature uses the popular memcached daemon to relay requests such as ADD, SET, and GET for key-value pairs. These simple operations to store and retrieve data avoid the SQL overhead such as parsing and constructing a query execution plan. You can access the same data through the NoSQL API and SQL. For example, you might use the NoSQL API for fast updates and lookups, and SQL for complex queries and compatibility with existing applications.
7> Replication and logging :: MySQL now supports transaction-based replication using global transaction identifiers (also known as “GTIDs”). This makes it possible to identify and track each transaction when it is committed on the originating server and as it is applied by any slaves. Enabling of GTIDs in a replication setup is done primarily using the new --gtid-mode and --enforce-gtid-consistency server options.
8> EXPLAIN plan :: The EXPLAIN statement now provides execution plan information for DELETE, INSERT, REPLACE, and UPDATE statements. Previously, EXPLAIN provided information only for SELECT statements.
9> Data Types :: MySQL now permits fractional seconds for TIME, DATETIME, and TIMESTAMP values, with up to microseconds (6 digits) precision.

MySQL 5.7
---------------
1> Security Improvements :: MySQL now enables database administrators to establish a policy for automatic password expiration: Any user who connects to the server using an account for which the password is past its permitted lifetime must change the password. Administrators can lock and unlock accounts for better control over who can log in. Installation creates no anonymous-user accounts. And created no test database.
2> Online ALTER TABLE :: ALTER TABLE now supports a RENAME INDEX clause that renames an index. The change is made in place without a table-copy operation. It works for all storage engines.
3> InnoDB Enhancements (Optimized DDL) :: DDL performance for InnoDB temporary tables is improved through optimization of CREATE TABLE, DROP TABLE, TRUNCATE TABLE, and ALTER TABLE statements.
4> InnoDB Enhancements (Transport Partitioned tablespace) :: InnoDB supports the Transportable Tablespace feature for partitioned InnoDB tables and individual InnoDB table partitions. This enhancement eases backup procedures for partitioned tables and enables copying of partitioned tables and individual table partitions between MySQL instances.
5> InnoDB Enhancements (Dynamic innodb_buffer_pool_size) :: The innodb_buffer_pool_size parameter is dynamic, allowing you to resize the buffer pool without restarting the server. The resizing operation, which involves moving pages to a new location in memory, is performed in chunks. Chunk size is configurable using the new innodb_buffer_pool_chunk_size configuration option. You can monitor resizing progress using the new Innodb_buffer_pool_resize_status status variable.
6> InnoDB Enhancements (Undo Tablespace) :: Can truncate undo logs that reside in undo tablespaces. This feature is enabled using the innodb_undo_log_truncate configuration option.
7> JSON Support :: MySQL supports a native JSON type. JSON values are not stored as strings, instead using an internal binary format that permits quick read access to document elements. JSON documents stored in JSON columns are automatically validated whenever they are inserted or updated, with an invalid document producing an error.
8> System and status variables :: System and status variable information is now available in Performance Schema tables, in preference to use of INFORMATION_SCHEMA tables to obtain these variable. This also affects the operation of the SHOW VARIABLES and SHOW STATUS statements.
9> sys schema :: MySQL distributions now include the sys schema, which is a set of objects that help DBAs and developers interpret data collected by the Performance Schema. sys schema objects can be used for typical tuning and diagnosis use cases.
10> Triggers :: Previously, a table could have at most one trigger for each combination of trigger event (INSERT, UPDATE, DELETE) and action time (BEFORE, AFTER). This limitation has been lifted and multiple triggers are permitted.
11> mysql client :: Previously, Control+C in mysql interrupted the current statement if there was one, or exited mysql if not. Now Control+C interrupts the current statement if there was one, or cancels any partial input line otherwise, but does not exit.
12> Multi-source replication is now possible :: MySQL Multi-Source Replication adds the ability to replicate from multiple masters to a slave. MySQL Multi-Source Replication topologies can be used to back up multiple servers to a single server, to merge table shards, and consolidate data from multiple servers to a single server.

Reference:
https://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html
https://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html
https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html

Thursday, August 18, 2016

Change/Forgot 'root' user password

Forgot MySQL Root Password:

Steps:
1> Stop MW/application connecting to Database
2> Kill MySQL process
3> Start database with --skip-grant-tables option
4> Logon to MySQL DB using 'root' user without password.
5> Change 'root' user password.
6> Kill MySQL process
7> Restart DB with normal process/convention.

Detailed Steps:
[root@ ~]# mysql -uroot -ptest -Dpractice
Warning: Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
[root@ ~]#
[root@ ~]# mysqladmin -uroot shutdown
mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'root'@'localhost' (using password: NO)'
[root@ ~]# <Grep pid of mysql process running>
[root@ ~]# kill -9 22961 23680
[root@ ~]# bin/mysqld_safe --defaults-file=/u01/app/mysql/practice/etc/my.cnf --datadir=/u02/mysqldata/practice/data --pid-file=/u03/mysqldata/practice/logs/practice_3.pid --skip-grant-tables &
[1] 26803
[root@ ~]# 160818 01:16:57 mysqld_safe Logging to '/u03/mysqldata/practice/logs/practice_3.err'.
160818 01:16:57 mysqld_safe Starting mysqld daemon with databases from /u02/mysqldata/practice/data

[root@ ~]# mysql -uroot
mysql> UPDATE mysql.user SET Password = PASSWORD('test') WHERE User = 'root';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 0

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

[root@ ~]# ps -eaf | grep -i mysqld
[root@ ~]# <Grep pid of mysql process running>
[root@ ~]# kill -9 26803 27539
[root@ ~]# bin/mysqld_safe --defaults-file=/u01/app/mysql/practice/etc/my.cnf --datadir=/u02/mysqldata/practice/data --pid-file=/u03/mysqldata/practice/logs/practice_3.pid &
[root@ ~]# mysql -uroot
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)
[root@ ~]# mysql -uroot -ptest
mysql>

3rd highest Salary

Publish 3rd highest Salary:

mysql> select * from employee;
+------+---------+------+--------+
| eid | name    | did | salary |
+------+---------+------+--------+
|    1 | gaurav |    1 |   1000 |
|    2 | gagan   |    5 |   6000 |
|    3 | deepa   | NULL |   5000 |
|    4 | neha    | NULL |   2000 |
|    5 | saurabh | NULL |   4000 |
|    6 | meenu   | NULL |   3000 |
+------+---------+------+--------+
6 rows in set (0.00 sec)

mysql> select * from employee order by salary desc;
+------+---------+------+--------+
| eid | name    | did | salary |
+------+---------+------+--------+
|    2 | gagan   |    5 |   6000 |
|    3 | deepa   | NULL |   5000 |
|    5 | saurabh | NULL |   4000 |
|    6 | meenu   | NULL |   3000 |
|    4 | neha    | NULL |   2000 |
|    1 | gaurav |    1 |   1000 |
+------+---------+------+--------+
6 rows in set (0.00 sec)

mysql> select * from employee order by salary desc limit 2,1;
+------+---------+------+--------+
| eid | name    | did | salary |
+------+---------+------+--------+
|    5 | saurabh | NULL |   4000 |
+------+---------+------+--------+
1 row in set (0.00 sec)

Left Outer Join / Right Outer Join

There are two kinds of OUTER joins in SQL, LEFT OUTER join and RIGHT OUTER join. The main difference between RIGHT OUTER join and LEFT OUTER join, as there name suggest, is the inclusion of non-matched rows. Sine INNER join only include matching rows, where the value of joining column is same, in the final result set, but OUTER join extends that functionality and also include unmatched rows in the final result. LEFT outer join includes unmatched rows from table written on the left of join predicate. On the other hand, RIGHT OUTER join, along with all matching rows, includes unmatched rows from the right side of the table.

In short result of LEFT outer join is INNER JOIN + unmatched rows from LEFT table and RIGHT OUTER join is INNER JOIN + unmatched rows from the right-hand side table.

mysql> select e.eid,e.name,d.did,d.dname from employee e left outer join department d on d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
|    3 | deepa | NULL | NULL      |
|    4 | neha   | NULL | NULL      |
+------+--------+------+-----------+
4 rows in set (0.00 sec)

mysql> select e.eid,e.name,d.did,d.dname from department d right outer join employee e on d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
|    3 | deepa | NULL | NULL      |
|    4 | neha   | NULL | NULL      |
+------+--------+------+-----------+
4 rows in set (0.00 sec)

Note: We can exchange the left and right outer join predicate as we change the tables order in query.

mysql> select e.eid,e.name,d.did,d.dname from department d left outer join employee e on d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
| NULL | NULL   |    2 | HR        |
| NULL | NULL   |    3 | IT        |
| NULL | NULL   |    4 | Sales     |
+------+--------+------+-----------+
5 rows in set (0.00 sec)

mysql> select e.eid,e.name,d.did,d.dname from employee e right outer join department d on d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
| NULL | NULL   |    2 | HR        |
| NULL | NULL   |    3 | IT        |
| NULL | NULL   |    4 | Sales     |
+------+--------+------+-----------+
5 rows in set (0.00 sec)

Reference:
http://javarevisited.blogspot.com/2013/05/difference-between-left-and-right-outer-join-sql-mysql.html

Joins / Inner Join / Outer Join

A :: Employees
B :: Department

mysql> select * from employee;
+------+--------+------+
| eid | name   | did |
+------+--------+------+
|    1 | gaurav |    1 |
|    2 | gagan |    5 |
|    3 | deepa | NULL |
|    4 | neha   | NULL |
+------+--------+------+
4 rows in set (0.00 sec)

mysql> select * from department;
+------+-----------+
| did | dname     |
+------+-----------+
|    1 | Finance   |
|    2 | HR        |
|    3 | IT        |
|    4 | Sales     |
|    5 | Marketing |
+------+-----------+
5 rows in set (0.00 sec)

Note: Explained below, all 7 cases (Clockwise)

1> Employees which been assigned to some department.
mysql> select e.eid,e.name,d.did,d.dname from employee e,department d where d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
+------+--------+------+-----------+
2 rows in set (0.00 sec)

2> List all departments which been assigned any employee or not.
mysql> select e.eid,e.name,d.did,d.dname from employee e right outer join department d on d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
| NULL | NULL   |    2 | HR        |
| NULL | NULL   |    3 | IT        |
| NULL | NULL   |    4 | Sales     |
+------+--------+------+-----------+
5 rows in set (0.00 sec)

3> List all departments which not been assigned any employee.
mysql> select e.eid,e.name,d.did,d.dname from employee e right outer join department d on d.did=e.did where e.eid is null;
+------+------+------+-------+
| eid | name | did | dname |
+------+------+------+-------+
| NULL | NULL |    2 | HR    |
| NULL | NULL |    3 | IT    |
| NULL | NULL |    4 | Sales |
+------+------+------+-------+
3 rows in set (0.00 sec)

4> List all departments which not been assigned any employee And List all employees which not been assigned any departments.
mysql> select e.eid,e.name,d.did,d.dname from employee e right outer join department d on d.did=e.did where e.eid is null union select e.eid,e.name,d.did,d.dname from employee e left outer join department d on d.did=e.did where d.did is null;
+------+-------+------+-------+
| eid | name | did | dname |
+------+-------+------+-------+
| NULL | NULL |    2 | HR    |
| NULL | NULL |    3 | IT    |
| NULL | NULL |    4 | Sales |
|    3 | deepa | NULL | NULL |
|    4 | neha | NULL | NULL |
+------+-------+------+-------+
5 rows in set (0.00 sec)

5> List all employees and departments.
mysql> select e.eid,e.name,d.did,d.dname from employee e right outer join department d on d.did=e.did where e.eid is null union select e.eid,e.name,d.did,d.dname from employee e left outer join department d on d.did=e.did where d.did is null union select e.eid,e.name,d.did,d.dname from employee e inner join department d on d.did=e.did ;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
| NULL | NULL   |    2 | HR        |
| NULL | NULL   |    3 | IT        |
| NULL | NULL   |    4 | Sales     |
|    3 | deepa | NULL | NULL      |
|    4 | neha   | NULL | NULL      |
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
+------+--------+------+-----------+
7 rows in set (0.01 sec)

6> List all employees which not been assigned any departments.
mysql> select e.eid,e.name,d.did,d.dname from employee e left outer join department d on d.did=e.did where d.did is null;
+------+-------+------+-------+
| eid | name | did | dname |
+------+-------+------+-------+
|    3 | deepa | NULL | NULL |
|    4 | neha | NULL | NULL |
+------+-------+------+-------+
2 rows in set (0.00 sec)

7> List all employees which been assigned any departments or not.
mysql> select e.eid,e.name,d.did,d.dname from employee e left outer join department d on d.did=e.did;
+------+--------+------+-----------+
| eid | name   | did | dname     |
+------+--------+------+-----------+
|    1 | gaurav |    1 | Finance   |
|    2 | gagan |    5 | Marketing |
|    3 | deepa | NULL | NULL      |
|    4 | neha   | NULL | NULL      |
+------+--------+------+-----------+
4 rows in set (0.00 sec)

References:
http://javarevisited.blogspot.com/2013/05/difference-between-left-and-right-outer-join-sql-mysql.html

Wednesday, August 17, 2016

Database Administration tips

10 DB Administration tips:
------------------------------------
1> Define backup/recovery strategy - Take backups at regular intervals - Test the recovery time - Manage how much downtime a system can afford (in case of disaster)
2> Optimize the DB parameters based upon system configuration.
3> Define the high availability mode - keeping the system in replication on some other HW machine. Replicated DB should be in some other zone.
4> Always keep some of queries pretty handy:
   a) Size of a table/DB
   b) How much processes are running in DB
   c) Kill single/all processes in DB
   d) Change the parameter value globally/session wide.
   e) To check the value of any DB level parameter
   f) Create user/update the user password
   g) If the replication is working fine
5> Auto commit at DB level should be off - let transaction handle it.
6> DB Monitoring/Alert mechanism should be in place. (Eg. nagios/cacti and some sort of shell/python scripts which take DB health snapshot at regular intervals)
7> Upgrade your databases timely. Don't lag too behind. (MySQL 5.6 or MySQL 5.7)
8> Logging tables should be partitioned on date basis. For partitioned tables, some cleanup mechanism should be defined.
9> If required, keep older data in some warehouse. And try to get rid of it from transactional database.
10> Try to maintain data and indexes into memory. (Database size should be less than innodb_buffer_pool_size)

Database Design Tips

10 DB Design tips:
-------------------------
1> While creating table/column - give comments with relevant description.
2> Always create 2 columns - createRowTS, lastUpdatedTS.
3> Table name & column name should follow some convention like authUsers.
4> Every table should adhere to Primary Key.
5> Strategy to find out purging older data from a table - See the business requirement if we can clean up the table.
6> If columns are properly indexed - with relevance to DML queries at MW.
7> DDL/DML patches from the DB should not get failed if executed more than once.
8> Every change at DB - should have one small document referring to the necessity of change. Always keep the ER diagram handy.
9> Keep an eye upon slow queries and queries which are not using indexes.
10> Get rid of redundant indexes.

Tuesday, August 16, 2016

Query Execution Stages

Query Execution Stages:
--------------------------------

Various stages it will go through:
Client <--> SQL <--> Query Cache --> Parser --> Preprocessor --> Query Optimizer --> Query Execution engine --> Storage Engine --> Data

1> The client sends the SQL statement to the server.
2> The server checks the query cache. If there’s a hit, it returns the stored result from the cache; otherwise, it passes the SQL statement to the next step.
3> The server parses, preprocesses, and optimizes the SQL into a query execution plan.
4> The query execution engine executes the plan by making calls to the storage engine API.
5> The server sends the result to the client.

Memory been allocated to MySQL

Memory calculation, the formula is:
global buffers + (thread buffers * max_connections) = a bit less than your RAM (so the OS and some MySQL admin stuff can work).

Global buffers: key_buffer_size, innodb_buffer_pool_size, innodb_log_buffer_size, innodb_additional_mem_pool_size, query_cache_size

Thread buffers: sort_buffer_size, read_buffer_size, join_buffer_size, read_rnd_buffer_size, binlog_cache_size, thread_stack, tmp_table_size

mysql> SELECT (@@key_buffer_size
    -> + @@query_cache_size
    -> + @@innodb_buffer_pool_size
    -> + @@innodb_additional_mem_pool_size
    -> + @@innodb_log_buffer_size
    -> + @@max_connections * ( @@read_buffer_size
    -> + @@read_rnd_buffer_size
    -> + @@sort_buffer_size
    -> + @@join_buffer_size
    -> + @@binlog_cache_size
    -> + @@thread_stack
    -> + @@tmp_table_size )
    -> ) / (1024 * 1024 * 1024) AS MAX_MEMORY_GB;
+---------------+
| MAX_MEMORY_GB |
+---------------+
|       30.0272 |
+---------------+
1 row in set (0.00 sec)

Note: This is the maximum memory been allocated/utilized by MySQL at the time of peak activity.
(which is generally not the case, as all the threads using all the parameter values for given interval of time). However, at worst case, this would be the maximum memory been allocated to MySQL.

Database Size

Data Size/ Index Size/ Table Size
------------------------------------------
mysql> select table_name,
             round(data_length/1024/1024,2) "Data Size",
             round(index_length/1024/1024,2) "Index Size",
             round(data_length/1024/1024,2)+round(index_length/1024/1024,2) "Table Size"
             from information_schema.tables
             where table_schema='practice';
+----------------------------+-----------+------------+------------+
| table_name                 | Data Size | Index Size | Table Size |
+----------------------------+-----------+------------+------------+
| test1                      |      1.52 |       0.00 |       1.52 |
| test2                      |      0.14 |       0.00 |       0.14 |
| test3                      |    642.06 |     447.67 |    1089.73 |
| test4                      |    567.66 |     129.56 |     697.22 |
| test5                      |      0.02 |       0.02 |       0.04 |
| test6                      |      0.02 |       0.02 |       0.04 |
| test7                      |      0.02 |       0.00 |       0.02 |
+----------------------------+-----------+------------+------------+
7 rows in set (0.00 sec)

Database Size
------------------
mysql> select sum(round(data_length/1024/1024,2))+sum(round(index_length/1024/1024,2)) "DB Size"
            from information_schema.tables
            where table_schema='practice';
+---------+
| DB Size |
+---------+
| 1788.71 |
+---------+
1 row in set (0.00 sec)

B+Tree internal working functionality

BTree internally working functionality
--------------------------------------------------
MySQL allows for different types of indexes to be used. Most common is a B-Tree index, although this name is also used for other types of indexes: T-Tree in MySQL Cluster and B+Tree in InnoDB. It is called as Balanced Tree and not Binary Tree.

Requirement:
- Linear search is very slow, complexity is O(n)
- Indexes improve search performance. But add extra cost to INSERT/UPDATE/DELETE
- InnoDB uses a B+Tree structure for its indexes.

B+Tree Characteristics:
- Every node can have p – 1 key values and p node pointers (p is called the order of the tree)
- The leaf node contains data, internal nodes are only used as guides
- The leaf nodes are connected together as doubly linked list
- Keys are stored in the nodes in sorted order
- All leaf nodes are at the same height, that’s why it’s called a balanced tree

Let’s look at how a B+Tree index is designed:

In the diagram above, we have a root node and four leaf nodes. Each entry in a leaf node has a pointer to a row related to this index entry. Leaf nodes are also connected together - each leaf node has a pointer to the beginning of another leaf node (brown square) - this allows for a quick range scan of the index. Index lookup is also pretty efficient. Let’s say we want to find data for entry ‘9’. Instead of scanning the whole table, we can use an index. A quick check in the root node and it’s clear we need to go to a leaf node which covers entries 5<= x < 11. So, we go to the second leaf node and we have the entry for ‘9’.

In the presented example, some of the nodes are not full yet. If we’d insert a row with indexed value of, let’s say 20, it can be inserted into the empty space in the third leaf node.

But what about a row with indexed value of ‘50’? Things are going to be a bit more complex as we do not have enough space in the fourth leaf node. It has to be split.

So, it wasn’t just a single write, i.e., adding an entry to a leaf node. Instead, a couple of additional write operations have to be performed. Please bear in mind that we are still on two levels of a B-Tree index, we have a full root node though. Another split will require adding an additional level of nodes which leads to more write operations. As you may have noticed, indexes make write operations much more expensive than they normally are - a lot of additional work goes into managing indexes. This is very important concept to keep in mind - more indexes is not always better as you are trading quicker selects for slower inserts.

Cost Calculations: - h is the height of the tree
- p is the branching factor of the tree
- n is the number of rows in a table
- p = (page size in bytes/key length in bytes) + 1
- h > log n / log p

Search cost for a single row:
- S = h I/O ops
Update cost for a single row:
- U = search cost + rewrite data page = h + 1 I/O ops
Insert cost for a single row:
- I = search cost + rewrite index page + rewrite data page
- I = h + 1 + 1 = h + 2 I/O ops
Delete cost for a single row:
- D = search cost + rewrite index page + rewrite data page
- D = h + 1 + 1 = h + 2 I/O ops

B+Tree levels and Tree Depth:
B+trees are normally structured in such a way that the size of a node is chosen according to the page size. Why? Because whenever data is accessed on disk, instead of reading a few bits, a whole page of data is read, because that is much cheaper.
Let us look at an example:
Consider InnoDB whose page size is 16KB and suppose we have an index on a integer column of size 4bytes, so a node can contain at most 16 * 1024 / 4 = 4096 keys, and a node can have at most 4097 children.
So for a B+tree of height 1, the root node has 4096 keys and the nodes at height 1 (the leaf nodes) have 4096 * 4097 = 16781312 key values.
This goes to show the effectiveness of a B+tree index, more than 16 million key values can be stored in a B+tree of height 1 and every key value can be accessed in exactly 2 lookups.

Points to Consider:
- Updates are in place only if the new data is of the same size, otherwise its delete plus insert
- Inserts may require splits if the leaf node is full
- Occasionally the split of a leaf node necessitates split of the next higher node
- In worst case scenarios the split may cascade all the way up to the root node
- Deletions may result in emptying a node that necessitates the consolidation of two nodes

Advantages:
- Reduced I/O
- Reduced Rebalancing
- Extremely efficient range scans
- Implicit sorting

Reduced I/O:
- Height of a B+Tree is very small (and has a very large branching factor)
- Generally every node in a tree corresponds to a page of data (page size ranges from 2(11) to 2(14) bytes)
- A node read = read a page = 1 random I/O
- So to reach leaf node, we need to read h pages
- No matter if requested row is at the start or end of table, same number of I/O is needed

Reduced Rebalancing:
- A tree needs rebalancing after an insertion or deletion
- B+Tree is wide, more keys can fit in node, so rebalancing needed few times on insertions and deletions
- Note that rebalancing means extra I/O, so rebalancing saved is I/O saved

Efficient Range Scans:
- Leaf nodes are linked together as doubly linked list
- So need to traverse from root -> leaf just once
- Move from leaf -> leaf until you reach the end of range
- Entire tree may be scanned without visiting the higher nodes at all

Implicit Sorting:
- Nodes contain keys sorted in key-order
- Therefore records can be implicitly returned in sorted order
- No external sorting needed hence memory and CPU cycles saved
- Sometimes sorted data cannot fit into buffer, and data needs to be sorted in passes, needing I/O, which can be avoided if you need data in key order

B+Tree Index in InnoDB:
- B+Tree Index in InnoDB is a typical B+Tree structure, no strings attached!
- Leaf nodes contain the data (what the data is depends whether it’s a Primary Index or a Secondary Index)
- Root nodes and internal nodes contain only key values

Primary and Secondary B+Tree Indexes:
- Primary index holds the entire row data in its leaf nodes
- Primary index can also be called a clustered index, because data is clustered around PK values
- A single PK per table means, a single clustered index per table
- Secondary Indexes have the key values and PK values in the index and no row data
- PK values stored in the leaf nodes of a secondary index act as pointer to the data
- This means secondary index lookups are two lookups
: Cost of secondary index lookup
: C = Height of Secondary Index B+Tree + Height of Primary Index B+Tree

Characteristics of an Ideal Primary Index:
- Create primary index on column(s) that are not updated too often
- Keep the size of the primary index as small as possible
- Select the column(s) to create primary index on, that have sequentially increasing value
- Random value columns, such as those that store UUID, are very bad candidates for primary index

Speeding up Secondary Indexes:
- Remember secondary indexes only store PK pointers meaning two index lookups
- Performance can be dramatically improved if we avoid extra PK lookups
- The trick is to include all the columns queried, in the definition of the secondary index

Secondary Index Optimization:
- Secondary index lookups are more expensive than primary key lookups, because any secondary index lookup only results in a tuple that can be used to navigate the primary key.
- So there’s a cost to secondary indexes in InnoDB. There’s an optimization too. Once a query navigates to the leaf node of a secondary index, it knows two things: the values it used to navigate the index, and the primary key values of that row in the table.

Example:
create table test(
   variety varchar(10) primary key,
   note varchar(50),
   price int,
   key(price)
) engine=InnoDB;
Query> SELECT variety from test where price=5;

The query takes the value 5 and navigates the price index. When it gets to the leaf node, it finds the value, which it can use to navigate the primary key. But why does it need to do that? It already has the value it was looking for. In fact, if the query only refers to values in the secondary and clustered index, it doesn’t need to leave the secondary index.

This is a fantastic optimization. It means each secondary index is like another table, clustered index-first. In this example, the secondary index is like a table containing just price and variety, clustered in that order.

Noticed many people have a tendency to write SELECT * FROM... queries. If you don’t need all the columns, don’t select all the columns, because it can make the difference between a fast and a slow query. If you only select the columns you need, your query might be able to use one of the optimizations.

Tips and Take Away:
- Indexing should always be used to speed up access
- Index trade-off analysis can be done easily using the cost estimation formulae discussed
- Select optimal data types for columns, especially ones that are to be indexed – int vs bigint
- When selecting columns for PK, select those that would make the PK short, sequential and with few updates
- Avoid using UUID style PK definitions
- Insert speed is best when you insert in PK order
- When creating index on string columns, you don’t need to index the entire column, you can index a prefix of the column – idx(str_col(4))
- B+Tree indexes are only suitable for columns with good selectivity
- Don’t shy away from creating composite indexes

Note:
The InnoDB storage engine creates a clustered index for every table. If the table has a primary key, that is the clustered index. If not, InnoDB internally assigns a six-byte unique ID to every row and uses that as the clustered index.

References:
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
https://blog.jcole.us/2013/01/10/btree-index-structures-in-innodb/
https://www.percona.com/files/presentations/percona-live/london-2011/PLUK2011-b-tree-indexes-and-innodb.pdf
http://severalnines.com/blog/become-mysql-dba-blog-series-database-indexing
http://www.ovaistariq.net/733/understanding-btree-indexes-and-how-they-impact-performance/#.V7Lk5vl94lI

Monday, August 15, 2016

Indexes

Indexes can be classified into 3 major categories:

1> Clustered Vs Non-Clustered
2> Primary Key/UK/Normal Index/Full Text Index
3> B Tree/Hash Index/R Tree

Clustered Index :: The InnoDB term for a primary key index. InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving the primary key columns. For best performance, choose the primary key columns carefully based on the most performance-critical queries. Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or never updated. Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation. By default, with InnoDB, the primary index is a clustered index.

Non Clustered Index :: In InnoDB, the records in non-clustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key. A secondary index can be used to satisfy queries that only require values from the indexed columns. For more complex queries, it can be used to identify the relevant rows in the table, which are then retrieved through lookups using the clustered index. Creating and dropping secondary indexes has traditionally involved significant overhead from copying all the data in the InnoDB table. The fast index creation feature of the InnoDB Plugin makes both CREATE INDEX and DROP INDEX statements much faster for InnoDB secondary indexes.

PK :: A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If the primary key is not declared as NOT NULL, then MySQL declares them implicitly (and silently). A table can have only one PRIMARY KEY.

UK :: A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if we try to add a new row with a key value that matches an existing row. For all engines, a UNIQUE index permits multiple NULL values for columns that contain NULL.

Normal :: If it’s not primary or unique, it doesn’t constrain values inserted into the table, but it does allow them to be looked up more efficiently.

Full Text Index :: It is a more specialized form of indexing that allows full text search. Think of it as (essentially) creating an “index” for each “word” in the specified column. Up to 5.5 versions, this index is supported for MyISAM engine only but from 5.6 it supports both MyISAM and InnoDB engines. The special kind of index that holds the search index in the MySQL full-text search mechanism. The MySQL feature for finding words, phrases, Boolean combinations of words, and so on within table data, in a faster, more convenient, and more flexible way than using the SQL LIKE operator or writing your own application-level search algorithm.

B Tree :: A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character.

A tree data structure that is popular for use in database indexes. The structure is kept sorted at all times, enabling fast lookup for exact matches (equals operator) and ranges (for example, greater than, less than, and BETWEEN operators). This type of index is available for most storage engines, such as InnoDB and MyISAM. Because B-tree nodes can have many children, a B-tree is not the same as a binary tree, which is limited to 2 children per node.

Hash :: They are used only for equality comparisons that use the = or <=> operators (but are very fast). They are not used for comparison operators such as < that find a range of values. Systems that rely on this type of single-value lookup are known as “key-value stores”; to use MySQL for such applications, use hash indexes wherever possible. It is available for MEMORY tables. Contrast with hash index, which is only available in the MEMORY storage engine. The MEMORY storage engine can also use B-tree indexes, and you should choose B-tree indexes for MEMORY tables if some queries use range operators.

R Tree :: A tree data structure used for spatial indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.

3 ways to use indexes:
---------------------------
a> Using index to find rows :: =, Between, >= , <= , IN
b> Using index to sort data :: ORDER BY ASC, DESC
c> Using index to read data :: Use index to read the data, hence avoiding to read the row data itself. Read only index column data.

Points to Note:
a) Firstly, indexes consume adequate amount of disk space. Usually the space usage isn’t significant, but because of creating index on every column in every possible combination, the index file grows at a rather significant rate than a data file. In the case of large table size, the index file could reach the operating system’s maximum file size.

b) Secondly, the indexes slow down the speed of writing queries, such as INSERT, UPDATE and DELETE. Because MySQL has to internally maintain the “pointers” to the inserted rows in the actual data file, there is a high performance price to pay.

c) Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.

References:
http://dev.mysql.com/doc/refman/5.7/en/glossary.html#
https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
https://www.percona.com/blog/2009/09/12/3-ways-mysql-uses-indexes/
http://www.treselle.com/blog/mysql-indexes-basicstypes-and-features/

InnoDB Purging

InnoDB Purging
---------------------
- The purge operations (a type of garbage collection) that InnoDB performs automatically is now done in one or more separate threads, rather than as part of the master thread. This change improves scalability, because the main database operations run independently from maintenance work happening in the background.

- To control this feature, increase the value of the configuration option innodb_purge_threads. If DML action is concentrated on a single table or a few tables, keep the setting low so that the threads do not contend with each other for access to the busy tables. If DML operations are spread across many tables, increase the setting. Its maximum is 32.

- There is another related configuration option, innodb_purge_batch_size with a default value of 300 and maximum value of 5000. This option is mainly intended for experimentation and tuning of purge operations, and should not be interesting to typical users.

innodb_purge_threads :: The number of background threads devoted to the InnoDB purge operation. A minimum value of 1 signifies that the purge operation is always performed by background threads, never as part of the master thread. Running the purge operation in one or more background threads helps reduce internal contention within InnoDB, improving scalability. Increasing the value to greater than 1 creates that many separate purge threads, which can improve efficiency on systems where DML operations are performed on multiple tables. The maximum is 32.

innodb_purge_batch_size :: Defines the number of undo log pages that purge parses and processes in one batch from the history list. In a multi-threaded purge configuration, the coordinator purge thread divides innodb_purge_batch_size by innodb_purge_threads and assigns that number of pages to each purge thread. The innodb_purge_batch_size option also defines the number of undo log pages that purge frees after every 128 iterations through the undo logs.

The innodb_purge_batch_size option is intended for advanced performance tuning in combination with the innodb_purge_threads setting. Most MySQL users need not change innodb_purge_batch_size from its default value.

References:
http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_purge_threads
http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_purge_threads

undo & redo logs

#undo#
- A storage area that holds copies of data modified by active transactions. If another transaction needs to see the original data (as part of a consistent read operation), the unmodified data is retrieved from this storage area.

- By default, this area is physically part of the system tablespace. In MySQL 5.6 and higher, you can use the innodb_undo_tablespaces and innodb_undo_directory configuration options to split it into one or more separate tablespace files, the undo tablespaces, optionally stored on another storage device such as an SSD.

- The undo log is split into separate portions, the insert undo buffer and the update undo buffer.

- undo tablespace : One of a set of files containing the undo log, when the undo log is separated from the system tablespace

#redo#
- A disk-based data structure used during crash recovery, to correct data written by incomplete transactions. During normal operation, it encodes requests to change InnoDB table data, which result from SQL statements or low-level API calls through NoSQL interfaces. Modifications that did not finish updating the data files before an unexpected shutdown are replayed automatically.

- The redo log is physically represented as a set of files, typically named ib_logfile0 and ib_logfile1. The data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. The passage of data through the redo logs is represented by the ever-increasing LSN value. The original 4GB limit on maximum size for the redo log is raised to 512GB in MySQL 5.6.3.

- The disk layout of the redo log is influenced by the configuration options innodb_log_file_size, innodb_log_group_home_dir, and (rarely) innodb_log_files_in_group. The performance of redo log operations is also affected by the log buffer, which is controlled by the innodb_log_buffer_size configuration option.

Note:

References:
http://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_undo_log
http://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_redo_log

Friday, August 12, 2016

Resize ib_logfile*

<Check Existing ib_logfile Size>
[root@ip- ~]# du -sh /u02/mysqldata/practice/data/ib_*
1.1G    /u02/mysqldata/practice/data/ib_logfile0
1.1G    /u02/mysqldata/practice/data/ib_logfile1

<Check innodb_log_file_size parameter value>
[root@ip- ~]# cat /u01/app/mysql/practice/etc/my.cnf | grep -i innodb_log_file_size
innodb_log_file_size=1024M

<Check DB is up and running>
[root@ip- ~]# mysqladmin -uroot -proot status
Warning: Using a password on the command line interface can be insecure.
Uptime: 92 Threads: 2 Questions: 20 Slow queries: 0 Opens: 70 Flush tables: 1 Open tables: 63 Queries per second avg: 0.217

<Stop Database>
[root@ip- ~]# /etc/init.d/CS.mysql-practice stop
/u01/app/mysql/product/5.6.16 3316
Shutting down MySQL.. SUCCESS!

<Edit innodb_log_file_size parameter and change it to 512M>
[root@ip- ~]# vi /u01/app/mysql/practice/etc/my.cnf

<Check innodb_log_file_size parameter value>
[root@ip- ~]# cat /u01/app/mysql/practice/etc/my.cnf | grep -i innodb_log_file_size
innodb_log_file_size=512M

<Start Database>
[root@ip- ~]# /etc/init.d/CS.mysql-practice start
/u01/app/mysql/product/5.6.16 3316
Starting MySQL........... SUCCESS!

<Content of DB Error Log File>
2016-08-12 02:24:59 7620 [Note] InnoDB: Setting log file /u02/mysqldata/practice/data/ib_logfile101 size to 512 MB
InnoDB: Progress in MB: 100 200 300 400 500
2016-08-12 02:25:03 7620 [Note] InnoDB: Setting log file /u02/mysqldata/practice/data/ib_logfile1 size to 512 MB
InnoDB: Progress in MB: 100 200 300 400 500

<Check Updated ib_logfile Size>
[root@ip- ~]# du -sh /u02/mysqldata/practice/data/ib_*
512M    /u02/mysqldata/practice/data/ib_logfile0
512M    /u02/mysqldata/practice/data/ib_logfile1

Wednesday, August 10, 2016

MySQL InnoDB Architecture

References:
https://www.percona.com/blog/2010/04/26/xtradb-innodb-internals-in-drawing/

innochecksum

innochecksum utility :: innochecksum prints checksums for InnoDB files. This tool reads an InnoDB tablespace file, calculates the checksum for each page, compares the calculated checksum to the stored checksum, and reports mismatches, which indicate damaged pages. It was originally developed to speed up verifying the integrity of tablespace files after power outages but can also be used after file copies. Because checksum mismatches will cause InnoDB to deliberately shut down a running server, it can be preferable to use this tool rather than waiting for a server in production usage to encounter the damaged pages. As of MySQL 5.7.2, innochecksum supports files greater than 2GB in size. Previously, innochecksum only supported files up to 2GB in size.

innochecksum cannot be used on tablespace files that the server already has open. For such files, you should use CHECK TABLE to check tables within the tablespace. Attempting to run innochecksum on a tablespace that the server already has open will result in an “Unable to lock file” error.

If checksum mismatches are found, you would normally restore the tablespace from backup or start the server and attempt to use mysqldump to make a backup of the tables within the tablespace.

If DB and application is running:
Error: Unable to lock file:: test.ibd
fcntl: Resource temporarily unavailable

Good to shut down database and then perform.
[root@ip]# ./5.7.10/bin/innochecksum --page-type-summary --verbose --log=/tmp/log.txt practice/test.ibd

Variables (--variable-name=value)
and boolean options {FALSE|TRUE} Value (after reading options)
--------------------------------- ----------------------------------------
verbose                           TRUE
count                             FALSE
start-page                        0
end-page                          0
page                              0
strict-check                      crc32
no-check                          FALSE
allow-mismatches                  0
write                             crc32
page-type-summary                 TRUE
page-type-dump                    (No default value)
log                               /tmp/log.txt

File::practice/test.ibd
================PAGE TYPE SUMMARY==============
#PAGE_COUNT     PAGE_TYPE
===============================================
   11930        Index page
       0        Undo log page
       1        Inode page
       0        Insert buffer free list page
    1635        Freshly allocated page
       1        Insert buffer bitmap
       0        System page
       0        Transaction system page
       1        File Space Header
       0        Extent descriptor page
       0        BLOB page
       0        Compressed BLOB page
       0        Other type of page
===============================================
Additional information:
Undo page type: 0 insert, 0 update, 0 other
Undo page state: 0 active, 0 cached, 0 to_free, 0 to_purge, 0 prepared, 0 other

Log File:
[root@ip]# cat /tmp/log2.txt
InnoDB File Checksum Utility.
Filename = test.ibd
file test.ibd = 98304 bytes (6 pages)
page::0; log sequence number:first = 2524276; second = 2524276
page::0; old style: calculated = 1691960001; recorded = 1691960001
..
..
Page::13566 is empty and uncorrupted
page::13567; log sequence number:first = 0; second = 0
Page::13567 is empty and uncorrupted

References:
https://dev.mysql.com/doc/refman/5.7/en/innochecksum.html