1024programmer Mongodb Research on MongoDB query performance

Research on MongoDB query performance

Previous: Mongodb VS Mysql query performance, tested the query performance of mongodb and mysql. Result Description mongodb
The performance is OK, and it can be used instead of mysql.

But this test is at the million level, and my scene is at the KW level. So it is necessary to test the effect of mongodb at the kw level.

My test environment is 4G memory (a lot of memory is occupied by other programs), 2kw data, and the query randomly generates ids (query 20 ids at a time).

The test in such an environment is not ideal, and it is disappointing. The average query time is 500ms (compared to mysql
Not bad, especially under concurrent queries, the performance is poor. very low throughput). View its index size (query with db.mycoll.stats()): 2kw
There are about 1.1G indexes in the data, and the stored data is about 11G.

During the test, it was found that iowait accounts for about 50%, which seems to be the bottleneck of io. Also see mongodb
Not much memory used (less than the size of the index, it seems the machine is not big enough to test).

Change to a machine with available 6G memory. Under 50 concurrency, it can reach an average of 100 ms
It is relatively satisfactory, but the concurrency seems to be not strong enough. But this performance cannot be controlled by me, it is also controlled by the available memory of the machine. The reason is mongodb
It does not specify the size of memory that can be occupied. It uses all free memory as a cache, which is both an advantage and a disadvantage: advantage-it can maximize performance; disadvantage-easy to be interfered by other programs (it occupies its cache). According to my test, its ability to seize memory is not strong. mongodb
It uses a memory-mapped file vmm, the official description:

Memory Mapped Storage Engine

This is the current storage engine for MongoDB, and it uses
memory-mapped files for all disk I/O. Using this strategy,
the operating system’s virtual memory manager is in charge of
caching. This has several implications:

There is no redundancy between file system cache and database
cache: they are one and the same.

MongoDB can use all free memory on the server for cache space
automatically without any configuration of a cache size.

Virtual memory size and resident size will appear to be very
large for the mongod process. This is benign: virtual memory
space will be just larger than the size of the datafiles open and
mapped; resident size will vary depending on the amount of memory
not used by other processes on the machine.

Caching behavior (such as LRU’ing out of pages, and laziness of
page writes) is controlled by the operating system: quality of the
VMM implementation will vary by OS.

So looking at it this way, I think mongodb
Not specifying the memory size to guarantee normal caching is a disadvantage. It should at least ensure that all indexes can be placed in memory. But this behavior is not determined by the startup program, but by the environment (a fly in the ointment).

There is also an official paragraph saying that the index is placed in memory:

If your queries seem sluggish, you should verify that your
indexes are small enough to fit in RAM. For instance, if you’re
running on 4GB RAM and you have 3GB of indexes, then your indexes
probably aren’t fitting in RAM. You may need to add RAM and/or
verify that all the indexes you’ve created are actually being
used.

I still hope that the memory size can be specified in mongodb to ensure that it has enough memory to load the index.

Summary: Under the large amount of data (kw level), the concurrent query of mongodb is not ideal (100-200/s). Writing data is fast (in my environment, remote submission is nearly
1w/s, it is estimated that it is no problem to reach 1.5W/s, and it is basically not affected by the large amount of data).

Paste a test data:

1 id (memory usage <1.5g) 10 id (memory usage 2-3g) 20 id (memory usage >4g)
1 2 3 1 2 3 1 2 3
total time 17.136 25.508 17.387 37.138 33.788 25.143 44.75 31.167 30.678
1 thread thruput 583.5668 392.0339 575.1423 269.266 295.9631 397.725 223.4637 320.8522 325.9665
total time 24.405 22.664 24.115 41.454 41.889 39.749 56.138 53.713 54.666
5 thread thruput 2048.76 2206.142 2073.398 1206.156 1193.631 1257.893 890.6623 930.8733 914.6453
total time 27.567 26.867 28.349 55.672 54.347 50.93 72.978 81.857 75.925
10 thread thruput 3627.526 3722.038 3527.461 1796.235 1840.028 1963.479 1370.276 1221.643 1317.089
total time 51.397 57.446 53.81 119.386 118.015 76.405 188.962 188.034 138.839
20 thread thruput 3891.278 3481.53 3716.781 1675.238 1694.7 2617.63 1058.414 1063.637 1440.517
total time 160.038 160.808 160.346 343.559 352.732 460.678 610.907 609.986 1411.306
50 thread thruput 3124.258 3109.298 3118.257 1455.354 1417.507 1085.357 818.4552 819.6909 354.2818
total time 2165.408 635.887 592.958 1090.264 1034.057 1060.266 1432.296 1466.971 1475.061
100 thread thruput 461.8067 1572.606 1686.46 917.209 967.0647 943.1595 698.1797 681.6767 677.9381

The above test uses three kinds of queries (1, 10, 20 id each time), tests 3 times under different concurrency, and issues 1w each time
queries. The first line of data is the cumulative time of all threads (in ms), the second line of data is the throughput (1w /(total time / thread
num)). The memory usage is slowly increasing in the test, so the latter data may be more efficient (efficient environment).

From the above table, 10 – 20 threads are relatively high throughput. To see memory usage, the premise is that the index is loaded into memory and some memory is used as a cache.

Below is a pdf of index query optimization.

Indexing and Query Optimizer

PS:

The default mongodb server only has 10 concurrency, if you want to increase the number of connections, you can use –maxConns num
To improve its reception of concurrent data.

The java driver of mongodb only has a maximum of 10 concurrent connection pools by default. To improve it, you can add to the environment of mongo.jar
MONGO.POOLSIZE system parameter, such as java -DMONGO.POOLSIZE=50 …

1455.354 1417.507 1085.357 818.4552 819.6909 354.2818 total time 2165.408 635.887 592.958 1090.264 1034.057 1060.266 1432.296 1466.971 1475.061 100 thread thruput 461.8067 1572.606 1686.46 917.209 967.0647 943.1595 698.1797 681.6767 677.9381

The above test uses three kinds of queries (1, 10, 20 id each time), tests 3 times under different concurrency, and issues 1w each time
queries. The first line of data is the cumulative time of all threads (in ms), the second line of data is the throughput (1w /(total time / thread
num)). The memory usage is slowly increasing in the test, so the latter data may be more efficient (efficient environment).

From the above table, 10 – 20 threads are relatively high throughput. To see memory usage, the premise is that the index is loaded into memory and some memory is used as a cache.

Below is a pdf of index query optimization.

Indexing and Query Optimizer

PS:

The default mongodb server only has 10 concurrency, if you want to increase the number of connections, you can use –maxConns num
To improve its reception of concurrent data.

The java driver of mongodb only has a maximum of 10 concurrent connection pools by default. To improve it, you can add to the environment of mongo.jar
MONGO.POOLSIZE system parameter, such as java -DMONGO.POOLSIZE=50 …

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/research-on-mongodb-query-performance/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索