Research on MongoDB query performance-1024programmer

Previous: Mongodb VS Mysql query performance, tested the query performance of mongodb and mysql. Result Description mongodb
The performance is OK, and it can be used instead of mysql.

But this test is at the million level, and my scene is at the KW level. So it is necessary to test the effect of mongodb at the kw level.

My test environment is 4G memory (a lot of memory is occupied by other programs), 2kw data, and the query randomly generates ids (query 20 ids at a time).

The test in such an environment is not ideal, and it is disappointing. The average query time is 500ms (compared to mysql
Not bad, especially under concurrent queries, the performance is poor. very low throughput). View its index size (query with db.mycoll.stats()): 2kw
There are about 1.1G indexes in the data, and the stored data is about 11G.

During the test, it was found that iowait accounts for about 50%, which seems to be the bottleneck of io. Also see mongodb
Not much memory used (less than the size of the index, it seems the machine is not big enough to test).

Change to a machine with available 6G memory. Under 50 concurrency, it can reach an average of 100 ms
It is relatively satisfactory, but the concurrency seems to be not strong enough. But this performance cannot be controlled by me, it is also controlled by the available memory of the machine. The reason is mongodb
It does not specify the size of memory that can be occupied. It uses all free memory as a cache, which is both an advantage and a disadvantage: advantage-it can maximize performance; disadvantage-easy to be interfered by other programs (it occupies its cache). According to my test, its ability to seize memory is not strong. mongodb
It uses a memory-mapped file vmm, the official description:

Memory Mapped Storage Engine

This is the current storage engine for MongoDB, and it uses
memory-mapped files for all disk I/O. Using this strategy,
the operating system’s virtual memory manager is in charge of
caching. This has several implications:

There is no redundancy between file system cache and database
cache: they are one and the same.

MongoDB can use all free memory on the server for cache space
automatically without any configuration of a cache size.

Virtual memory size and resident size will appear to be very
large for the mongod process. This is benign: virtual memory
space will be just larger than the size of the datafiles open and
mapped; resident size will vary depending on the amount of memory
not used by other processes on the machine.

Caching behavior (such as LRU’ing out of pages, and laziness of
page writes) is controlled by the operating system: quality of the
VMM implementation will vary by OS.

So looking at it this way, I think mongodb
Not specifying the memory size to guarantee normal caching is a disadvantage. It should at least ensure that all indexes can be placed in memory. But this behavior is not determined by the startup program, but by the environment (a fly in the ointment).

There is also an official paragraph saying that the index is placed in memory:

If your queries seem sluggish, you should verify that your
indexes are small enough to fit in RAM. For instance, if you’re
running on 4GB RAM and you have 3GB of indexes, then your indexes
probably aren’t fitting in RAM. You may need to add RAM and/or
verify that all the indexes you’ve created are actually being
used.

I still hope that the memory size can be specified in mongodb to ensure that it has enough memory to load the index.

Summary: Under the large amount of data (kw level), the concurrent query of mongodb is not ideal (100-200/s). Writing data is fast (in my environment, remote submission is nearly
1w/s, it is estimated that it is no problem to reach 1.5W/s, and it is basically not affected by the large amount of data).

Paste a test data:

	1 id (memory usage <1.5g)			10 id (memory usage 2-3g)			20 id (memory usage >4g)
	1	2	3	1	2	3	1	2	3
total time	17.136	25.508	17.387	37.138	33.788	25.143	44.75	31.167	30.678
1 thread thruput	583.5668	392.0339	575.1423	269.266	295.9631	397.725	223.4637	320.8522	325.9665

total time	24.405	22.664	24.115	41.454	41.889	39.749	56.138	53.713	54.666
5 thread thruput	2048.76	2206.142	2073.398	1206.156	1193.631	1257.893	890.6623	930.8733	914.6453

total time	27.567	26.867	28.349	55.672	54.347	50.93	72.978	81.857	75.925
10 thread thruput	3627.526	3722.038	3527.461	1796.235	1840.028	1963.479	1370.276	1221.643	1317.089

total time	51.397	57.446	53.81	119.386	118.015	76.405	188.962	188.034	138.839
20 thread thruput	3891.278	3481.53	3716.781	1675.238	1694.7	2617.63	1058.414	1063.637	1440.517

total time	160.038	160.808	160.346	343.559	352.732	460.678	610.907	609.986	1411.306
50 thread thruput	3124.258	3109.298	3118.257	1455.354	1417.507	1085.357	818.4552	819.6909	354.2818

total time	2165.408	635.887	592.958	1090.264	1034.057	1060.266	1432.296	1466.971	1475.061
100 thread thruput	461.8067	1572.606	1686.46	917.209	967.0647	943.1595	698.1797	681.6767	677.9381

The above test uses three kinds of queries (1, 10, 20 id each time), tests 3 times under different concurrency, and issues 1w each time
queries. The first line of data is the cumulative time of all threads (in ms), the second line of data is the throughput (1w /(total time / thread
num)). The memory usage is slowly increasing in the test, so the latter data may be more efficient (efficient environment).

From the above table, 10 – 20 threads are relatively high throughput. To see memory usage, the premise is that the index is loaded into memory and some memory is used as a cache.

Below is a pdf of index query optimization.

Indexing and Query Optimizer

PS:

The default mongodb server only has 10 concurrency, if you want to increase the number of connections, you can use –maxConns num
To improve its reception of concurrent data.

The java driver of mongodb only has a maximum of 10 concurrent connection pools by default. To improve it, you can add to the environment of mongo.jar
MONGO.POOLSIZE system parameter, such as java -DMONGO.POOLSIZE=50 …

1455.354 1417.507 1085.357 818.4552 819.6909 354.2818 total time 2165.408 635.887 592.958 1090.264 1034.057 1060.266 1432.296 1466.971 1475.061 100 thread thruput 461.8067 1572.606 1686.46 917.209 967.0647 943.1595 698.1797 681.6767 677.9381

From the above table, 10 – 20 threads are relatively high throughput. To see memory usage, the premise is that the index is loaded into memory and some memory is used as a cache.

Below is a pdf of index query optimization.

Indexing and Query Optimizer

PS:

The default mongodb server only has 10 concurrency, if you want to increase the number of connections, you can use –maxConns num
To improve its reception of concurrent data.