mysql redo log _mysql8.0 source code introduction of lock-free redo log-1024programmer

InnoDB, like most storage engines, uses WAL to write data. All data is first written to the redolog, and then flushed from the bufferpool to the data page or when the backup is restored. Restoring from redolog to bufferpoll, and then flushing dirty data pages, it is very important for WAL to convert random writes into sequential writes. Therefore, in the era of mechanical disks, the performance of sequential writes is far greater than that of random writes. Make full use of It improves the performance of the disk. But it also brings a problem, that is, any write operation must be locked for access to ensure that the next write operation can only be performed after the previous write operation is completed. This is also implemented in the early versions of InnoDB, but With the increase of the number of CPU cores, such frequent locking will not be able to exert multi-core performance, so in InnoDB8.0 it has been changed to lock-free implementation. This is the official introduction: https://mysqlserverteam.com/mysql-8-0 -new-lock-free-scalable-wal-design

Version 5.6 implementation

There are two operations that need to obtain the global mutex, log_sys_t::mutex, log_sys_t::flush_order_mutex

Each user connection has a thread. Before writing data, log_sys_t::mutex must be obtained to ensure that only one user thread is writing to the logbuffer. As the number of connections increases, this performance will inevitably be affected. Impact

Similarly, when adding the written redolog to the flushlist, in order to ensure that only one user thread adds buffer from the logbuffer to the flushlist, it is necessary to obtain log_sys_t::flush_order_mutex to ensure

As shown in the picture:

So in the implementation of version 5.6, we need to get log_sys_t::mutex first, then write to buffer, then get log_sys_t::flush_order_mutex, release log_sys_t::mutex, and then add the corresponding page to flushlist

So the 8.0 lock-free implementation is mainly to remove these two mutex

8.0 lock-free implementation

log_sys_t::mutex*

In When removing the first log_sys_t::mutex, pre-allocate the address before writing, and then write to the specified address when writing, so that there is no need to grab the mutex. Similarly, the problem comes: all threads go to When obtaining the lsn address, a mutex is also required to prevent conflicts. InnoDB achieves lock-free implementation by using atomic, that is: constsn_tstart_sn=log.sn.fetch_add(len);

In each After a thread obtains the location of the lsn it wants to write, the writing can naturally start concurrently.

So when writing, if the thread whose location is in front has not finished writing, and the location is close to The last one has been written, how should I write the contents of the Logbuffer to the redolog at this time, the written data is definitely not allowed to have holes.

The log_writer thread is introduced in 8.0, and the log_writer thread checks the logbuffer Whether there is a hole. The specific implementation is to introduce the name recent_written to record whether the logbuffer is continuous. This recent_written is a link_buf implementation, typed in a union search set. Therefore, the maximum size of t that allows concurrent writing is the size of this recent_written

link_buf is implemented as shown in the figure:

This background thread is awakened when the user writes data to recent_writtenbuffer, checks whether the continuous position of recent_written can be advanced, if so, moves forward, and writes the content in recent_writtenbuffer to redolog

log_sys_t::flush_order_mutex

If the flush_order_mutex is not removed, the user thread still cannot start concurrently, because the user thread needs to add the corresponding page to the flushlist after writing the redolog To exit, and to join the flushlist, you need to obtain the flush_order_mutex lock to ensure that the sequence is added to the flushlist. Therefore, you must also remove the flush_order_mutex.

The specific method allows the corresponding dirty pages in the logbuffer to be added to the flushlist out of order .After the user finishes writing the logbuffer, he can add the dirty pages corresponding to the corresponding logbuffer to the flushlist. There is no need to grab the flush_order_mutex. In this way, the pagesn added to the flushlist may be out of order, so when doing checkpoint, there is no guarantee The pagelsn at the top of each flushlist is the smallest

InnoDB uses a recent_closed to record whether the logbuffer added to the flushlist is continuous, so it is easy to get, pagelsn-recent_closed.size() on the flushlist lsn is used to make checkpoints, which is definitely safe.

Similarly, there is a Log_closer thread in the background of InnoDB to periodically check whether recent_closed is continuous. If it is continuous, push the recent_closedbuffer forward, then the information of checkpoint can also be advanced.

So in the implementation of 8.0, the operation of a writeredolog is divided into several stages

Get the write location, realize: user thread

Write data to Logbuffer implementation: user thread

writes data in logbuffer to redolog file implementation: logwriter

flushes pagecache in redolog to disk implementation: logflusher

will The page corresponding to the logbuffer in redolog is added to the flushlist

Update can be checked point information recent_closed implementation: logcloser

According to recent_closed checkpoint information implementation: logcheckpointer

Code implementation

The main memory structure in redolog

logfile. That is our common ib_logfile file

logbuffer, the usual size is 64M. Users inIn the advance_tail_until operation, the operation that will be performed at the same time is to reclaim the previous space* So after the advance_tail_until is executed, the continuous memory will be released* There is also a validate_no_links function to check whether the release is correct*/

In this way, the recent_closedbuffer has been cleaned up to ensure that the recent_closedbuffer has always had space

log_closerthread will always update the log_advance_dirty_pages_added_up_to_lsn(), which is to update the log_buffer_dirty_pages_added_up_to_lsn() in the recent_closebuffer all the time, and then When doing a checkpointer, it will always check this log_buffer_dirty_pages_added_up_to_lsn(). The lsn that can be used as a checkpoint must be smaller than this log_buffer_dirty_pages_added_up_to_lsn(), because log_buffer_dirty_pages_added_up_to_lsn indicates the actual position in the recentclosebuffer, and the L before this position sn has been filled, yes Continuous, the lsn after this position does not have this guarantee.

So who is responsible for updating the recent_closed array? log_closedthread

When will dirtypage be added to the flushlist of the bufferpool?

When mtr->commit(), the page modified by this mtr will be added to the flushlist. Before adding to the flushlist, we will ensure that it is written to the redolog, and this redolog Already flushed.

log_checkpointer

This thread waits on log.checkpointer_event, and then judges 10*1000, which is 10s,

os_event_wait_time_low (log.checkpointer_event, 10*1000, sig_count);

os_event_wait_time_low is waiting for checkpointer_event to be woken up, or the timeout time is 10s, it is actually pthread_cond_timedwait()

Under normal circumstances, it is waiting 10s and then log_checkpointer is awakened, so where is the scene where the checkpointer_event is notified to be awakened?

In fact, it is also in the log_writer_write_buffer() function, first judge

while(1){constlsn_tlsn_diff& # 61; min_next_lsn-checkpoint_lsn; if (lsn_diff <& # 61; log.lsn_capacity) {Checkpoint_limited_LSN & # 61; Checkpoint_lsn & # 43; l og.lsn_capacity; Break;} log_request_checkpoint (log, false); …} // Why do you need to add this logic in the process of log_writer? This logic is to judge whether lsn_diff (the size of the data to be written this time) exceeds log.lsn_capacity (the remaining capacity of redolog). If it is smaller than it, then you can directly If the write operation is performed, break it out. If it is larger than it, it means that if the write is written down this time, because the redolog is in the form of rotate, the current redolog will be written bad, so a checkpoint must be performed first, and a part of the Flush the content in the redolog to the btree data, and then increase the checkpoint point to make room. // So we see that if the checkpoint is not done in time, the redolog space will not be enough, and then directly affect the online writing Thread.

First of all, we must know that when the page modified by a transaction is flushed, we don’t know. Because the user only needs to write to the redolog, and after confirming that the redolog has been flushed, It returns directly. As for when from Bufferpoolflush to btreedata, this is asynchronous in the background, and the user does not pay attention. But after we checkpoint, the redolog before the checkpoint should be deleted, so we must ensure the checkpointlsn The redolog before this point has already flushed the corresponding page to the disk,

Then the question here is how to determine the checkpointlsn point?

Update in the function log_update_available_for_checkpoint_lsn(log); log.available_for_checkpoint_lsn

Specific update process:

Then execute log_update_available_for_checkpoint_lsn(log) in log_request_checkpoint =>

constlsn_toldest_lsn=log_get_available_for_checkpoint_lsn(log );

Then execute lsn_tlwn_lsn=buf_pool_get_oldest_modification_lwm()=>

buf_pool_get_oldest_modification_approx()

Here buf_pool_get_oldest_modification_approx() refers to getting the approximate The position of the oldest lsn, here is a problem caused by the introduction of recent_closedbuffer, because after the introduction of recent_closedbuffer, the flushlist added from the page above the redolog to the bufferpool cannot be guaranteed to be in order, and there may be 98& on a flushlist #61;>85=>110. So this function can only get the approximate oldest_modificationlsn

The specific method is to traverse all the flushlists of the bufferpool, and then only need to take out the last one in the flushlist element (although it cannot be guaranteed to be the oldest lsn because of the introduction of recent_closed), that is, the oldest lsn, and then compared with 8 flush_lists, the oldest lsn is the current approximate lsn

Then in buf_pool_get_oldest_modification_lwm( ) or inside, the lsn obtained by buf_pool_get_oldest_modification_approx() will be subtracted from the size of recent_closedbuffer, so that the obtained lsn can be guaranteed to be checkpointable, but this lsn cannot be guaranteed to be the largest lsn that can be checkedpointed. And this lsn is not necessarily Point to the beginning of a record, and more often point to the middle of a record, because the size of a recent_closedbuffer will be forcibly subtracted here. Previously, in version 5.6, it was possible to guarantee that this lsn is the default start position of a redolog record

Finally, use log_consider_checkpoint(log); to determine whether to write the checkpointer information this time

Then there are 3 specific conditions in log_should_checkpoint() to determine whether to do a checkpointer

p>When you finally decide to do it, use log_checkpoint(log); to write the checkpointer information

In the log_checkpoint() function

Use log_determine_checkpoint_lsn() to judge whether the checkpointer is To write dict_lsn, or to write available_for_checkpoint_lsn. In dict_lsn, it refers to the last DDL-related operation. Up to dict_lsn, all metadata related has been written to disk. Why should DDL-related operations and non-DDL Separate related operations?

Finally write the checkpoint information into the ib_logfile0 file through log_files_write_checkpoint

consider_checkpoint(log); to determine whether to write this checkpointer information this time

Then there are 3 specific conditions in log_should_checkpoint() to judge whether to do a checkpointer

Finally decide to do it

In the log_checkpoint() function

Use log_determine_checkpoint_lsn() to judge whether the checkpointer should be written to dict_lsn or To write available_for_checkpoint_lsn. In dict_lsn, it refers to the last DDL-related operation. Up to dict_lsn, all metadata related has been written to disk. Why should DDL-related operations be separated from non-DDL-related operations here?

Finally, write the checkpoint information into the ib_logfile0 file through log_files_write_checkpoint

mysql redo log _mysql8.0 source code introduction of lock-free redo log

author: admin

Leave a Reply Cancel reply

Contact us

Scan wechat and follow us

WeChat Swipe: Share

给这篇文章的作者打赏

WeChat Swipe: Share

author: admin

Related recommendations

Leave a Reply Cancel reply

Contact us

Scan wechat and follow us