Group commit for the binary log - Source

Revision
8870
User
knielsen
Date
2012-02-20 09:39
Group commit is an important optimisation in MariaDB 5.3 when the server is run with ##innodb_flush_logs_at_trx_commit=1## and/or ##sync_binlog=1##. These settings are needed to ensure that if a server crashes, any transaction that was reported as committed prior to the time of crash will still be present in the database after crash recovery ("durability", the "D" in ACID). Both settings are also necessary to be able to recover into a consistent state at all after a crash; if both are not set, it is possible after a crash to end up in a state where a transaction in the binary log is missing in InnoDB/XtraDB, or vice versa, which breaks replication for slaves. When these settings are in effect, the server does an ##fsync()## (or ##fdatasync()## or similar) call on the XtraDB transaction log file / binary log file during ##COMMIT## in order to ensure that data is stored durably on the disk. An ##fsync()## is a time-consuming operation, and can easily limit throughput in terms of the number of commits per second which can be sustained. The idea with group commit is to amortise the costs of each ##fsync()## over multiple commits from multiple parallel transactions. If there are say 10 transactions in parallel trying to commit, we can force all of them to disk at once with a single ##fsync()##, rather than do one ##fsync()## for each. This can greatly reduce the need for ##fsync()## calls, and consequently greatly improve the commits-per-second throughput. Group commit works automatically; there are no options needed to enable it, and there are no negative consequences (performance or reliability or otherwise) from it. However, to see the positive effects of group commit, the workload must have sufficient parallelism. Normally, at least three parallel transactions are needed for group commit to be effective: While the first transaction is waiting for ##fsync()## to complete, the other two transactions will queue up waiting for their turn to call ##fsync()##. When the first transaction is done, a single ##fsync()## can be used for the two queued-up transactions, saving in this case one of the three ##fsync()s##. In addition to sufficient parallelism, it is also necessary to have enough transactions per second wanting to commit that the ##fsync()## calls are a bottleneck. If no such bottleneck exists (i.e. transactions never or rarely need to wait for the ##fsync()## of another to complete), group commit will provide little to no improvement. == New Status Variables For the binary log, two new status variables are available for checking how effective group commit is at reducing the number of fsync() calls needed: === ##binlog_commits## * **Variable Name:** ##binlog_commits## * **Scope:** Session * **Data Type:** ##Number## This is the total number of transactions committed to the binary log. === ##binlog_group_commits## * **Variable Name:** ##binlog_group_commits## * **Scope:** Session * **Data Type:** ##Number## This is the total number of group commits done to the binary log (a group commit is when a group of transactions are written into the binary log together, sharing a single ##fsync()## call). When ##sync_binlog=1##, then this is the number of ##fsync()'s## done for transaction commits to the binary log Thus the extent to which group commit is effective at reducing the number of ##fsync()## calls on the binary log can be determined by the ratio between these two status variables. ##binlog_commits## will always be as high or higher than ##binlog_group_commits##; the bigger the difference, the more effective group commit was at reducing ##fsync()## overhead. To query these variables, use for example a statement like this: <<sql>> SHOW STATUS LIKE 'binlog_%commits'; <</sql>> (Information about the effectiveness of group commit inside XtraDB can be obtained from similar status variables like ##innodb_data_fsyncs##; however, the data may be slightly harder to interpret as XtraDB/InnoDB also uses ##fsync()## sometimes for other purposes than (group) commits). For more information about the design and implementation of group commit, see MWL#116. == See Also: * [[what-is-mariadb-53|What is MariaDB 5.3]] * [[http://www.facebook.com/note.php?note_id=10150211546215933|Group commit benchmark]]