1024programmer Mongodb Index optimization for MongoDB range queries

Index optimization for MongoDB range queries

We know that MongoDB’s index is a B-Tree structure, which is very similar to MySQL’s index. So you should have heard this suggestion: When creating an index, you should consider the sort operation, and try to put the fields used by the sort operation behind your index. But in some cases, doing so will make your query performance lower.

Question

For example, we perform the following query:

db.collection.find({“country”: “A”}).sort({“carsOwned”: 1})

The query condition is {“country”: “A”}, sorted by the positive order of the carsOwned field. So the index is easy to build, just create it directly
country , carsOwned The joint index of the two fields is enough. like this:

db.collection.ensureIndex({“country”: 1, “carsOwned”: 1})

Let’s look at a slightly more complex query:

db.collection.find({“country”: {“$in”: [“A”, “G”]}}).sort({“carsOwned”: 1})

This time we want to query the data entries whose country is A or G, and the results are also sorted by the carsOwned field.

If we also use the above index, and use explain() to analyze this query, we will find that in the output there is a “scanAndOrder”
: true field, and the value of nscanned may be much larger than expected, even specifying limit has no effect.

reason

What is the reason for this? Let’s first look at the picture below:

As shown in the figure above, the one on the left is an index built in the order of {“country”: 1, “carsOwned”: 1}. and the right is to press
{“carsOwned”: 1, “country”: 1} index built in order.

If we execute the above query, through the index on the left, we need to put all the child nodes and country with the value of country A (the left branch of the left figure)
All child nodes whose value is G (the one on the right side of the left figure) are fetched. Then perform a sorting operation on the retrieved data according to the value of carsOwned.

So the above explain outputs a “scanAndOrder”: true
The prompt, that is to say, for this query, the scan is performed first to obtain the data, and then an independent sorting operation is performed.

Then if we use the index on the right to do the query, the result will be different. Instead of putting the sort field at the end, we put it at the front, and instead put the filter field at the back. The result of this is: we will start traversing from the node with a value of 1 (the left branch of the right figure), when we find that there is
If the value of country is A or G, it will be directly put into the result set. When the specified amount is completed (specify limit
number) after the search. We can return the results directly, because at this time, all the results themselves are arranged in the positive order of carsOwned.

For the above dataset, if we need 2 results. We need to scan 4 records through the index on the left, and then sort the 4 records to return the result. On the right side, we only need to scan 2 results to return directly (because the query process is to traverse the index in the required order).

So, in a range query (including $in, $gt, $lt
etc.), in fact, deliberately appending the sorting index behind is usually ineffective. Because in the process of range query, the result set we get is not arranged according to the added field, and an additional sorting is required. In this case, it may be a better choice to build an index in reverse order (the sort field comes first, and the range query field follows). Of course, whether it is better is also related to the specific data set.

Summary

To sum up, give two chestnuts.

When the query is:

db.test.find({a:1,b:2}).sort({c:1})

Then directly create a joint index of {a:1, b:1, c:1} or {b:1, a:1, c:1}.

If the query is:

db.test.find({a:1,b:{$in:[1,2]}}).sort({c:1})

Then it is possible to build {a:1, c:1, b:1}
The joint index of the will be more appropriate. Of course, here is just another way of thinking, and whether to use it or not depends on your data situation.

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/index-optimization-for-mongodb-range-queries/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索