Preface: This article is organized by the editor of Programming Notes#. It mainly introduces the simplest way to understand why MongoDB index chooses B-tree, while Mysql chooses B+ tree. I hope it will be of certain reference value to you. .
This problem was mentioned by the teacher when I was watching the video. Although I knew their respective index structures before, I have not studied them yet. reason. There are many answers when searching online. But they are all very wordy. So that concludes this article.
1. The difference between B-tree and B+ tree
Obviously, if we want to understand the reason, we need to know the difference between B-tree and B+ tree. In order not to go into too much detail. We directly give their form to summarize their characteristics.
1. B-tree
B-tree is a self-balancing search tree with a very simple form:
This is a B-tree. The core features of our problem are as follows:
(1) Multi-way, non-binary tree
(2) Each node saves both the index and the data
(3) When searching Equivalent to binary search
Here we assume that we all already understand the structure related to the B-tree.
2. B+ tree
B+ tree is a variant of B-tree
The core features are as follows:
(1) Multi-path non-binary
(2) Only leaf nodes save data
(3) The search is equivalent to binary search
(4) Added pointers to adjacent points.
From the above we can see that there are two main core differences, one is the storage location of the data, and the other is the pointing of adjacent nodes. It is these two that cause the difference between MongoDB and mysql. why?
3. The difference between B-tree and B+ tree
(1) The B+ tree query time complexity is fixed at logn, and the B-tree query complexity is preferably O(1).
(2) The pointers of adjacent points of the B+ tree can greatly increase the interval accessibility and can be used in range queries, etc., while the B-tree has If the node key and data are together, interval search cannot be performed.
(3) B+ tree is more suitable for external storage, that is, disk storage. Since there is no data field in the internal nodes, the range that each node can index is larger and more precise
( 4) It is very important to note that this difference is based on (1) (2) (3). Each node of the B-tree saves both data and indexes, so the number of disk IOs is very small. The B+ tree only saves leaf nodes, and the disk There is a lot of IO, but interval access is better.
A sequential access pointer is added to the B+ tree, that is, each leaf node adds a pointer to the adjacent leaf node, so that Trees have become the preferred data structure for database systems to implement indexes.
There are many reasons, the most important one is that this tree is short and fat, haha. Generally speaking, the index is very large and is often stored on the disk in the form of an index file. Disk I/O consumption is incurred during index search. Compared with memory access, the consumption of I/O access is several orders of magnitude higher, so the evaluation The most important indicator of the quality of a data structure as an index is the time complexity of the number of disk I/O operations during the search process. The smaller the tree height, the fewer I/O times.
Then why is it a B+ tree instead of a B-tree? Because the nodes in it do not store data, so one node can store more keys.
2. Explanation of reasons
If you want to explain the reasons, We must also understand the basic concepts of MongoDB and Mysql.
1. MongoDB
MongoDB is a document-type database, a kind of nosql, which uses Json-like format to save data. For example, our previous tables may have user tables, order tables, shopping basket tables, etc., and we need to establish foreign key relationships between them. But Json class is different.
We can see that this form is simpler and easier to understand. So why does MongoDB use B-trees?
MongoDB uses B-tree, all nodes have Data fields, as long as the specified index is found For access, there is no doubt that a single query is faster than Mysql on average.
2. Mysql
Mysql as a relationship Type database, data correlation is very strong, interval access is a common situation, B+ tree because all data are stored in leaf nodes, and are strung together through pointers, so it is easy to perform interval traversal or even complete traversal.
The core of the difference between these two is easy to understand if you can understand the difference between B-tree and B+ tree.