Source: http://www.cnblogs.com/uttu/p/6513918.html
This article comes from the high-concurrency album of the Caoz Mengbai public account. In a graphical and loosely coupled way, it makes a detailed interpretation and analysis of the high-concurrency problems of the Internet. “Technology is overestimated in the short term and underestimated in the long term.” However, due to different scenarios and personnel costs, the solutions of giants may not be suitable for start-up companies, so how to ensure that high concurrency issues do not become obstacles on the road to entrepreneurship is a must for every full-stack engineer, senior system engineer, and ideal programmer. I hope this article will help you find your own “road to gold” and shine brightly.
the
Table of contents:
Interpretation of scenarios and solutions
Know the load
data tracking
Brain map, caoz God’s public number sharing
References
the
Adhering to the idea of knowing what it is and why it is so, and using the thinking of pulling cicadas to draw silk, interpret the usage scenarios of each technique one by one:
a. Network channel + foreground control
Reason: Under the premise of the current impetuous society, if the user clicks a button and there is no response within 3 seconds, it will basically be refreshed again. Then because your network channel is not smooth, you can get the data normally, but now the background request doubles due to the delay; And when the user refreshes crazily because there is no data, you should have control in the foreground, such as “3 seconds to click a button again, or let the user go crazy without sending a request (as if 360 has done this before)”, control Bad user behavior.
the
Solution: The background must support dual-network and dual-communication to ensure the bilateral deployment of Southern Telecom and Northern Netcom. Students who have played battle games should remember that they were divided into telecom and Netcom areas; at the same time, within the acceptable range of cost, Try to use CDN to accelerate.
the
b. Load balancing
Needless to say, but based on the technical principles of popular science, the main difficulty is the balanced distribution of nodes on the ring and the balanced number of node processing requests:
Consistent Hash is used. When it is full, it is a ring with a length of 2^32 (determined by the return value type of the Hash function). Server nodes are placed in the ring according to the Hash value of the name. Web requests are allocated according to the IP Hash or URL Hash value. Routing to the nearest server in the ring for request response.
1. The server Hash value ring is stored in a red-black tree, and CRC32_HASH, FNV1_32_HASH, and KETAMA_HASH must be used to ensure that the server Hash value is evenly distributed between 0 and 2^32, but the hashCode() of java.util.String does not work;?
2. In order to ensure the balance of the number of processing requests on a single server, it is necessary to virtualize a physical server into n virtual nodes (172.16.6.1:1\172.16.6.1:2…), and construct The ring is used to split a real point into multiple evenly distributed proxy points to ensure the balance of request distribution.
the
c. Synchronous and asynchronous processing of cache, database, and data bus:
1. Cache
It originates from the high-speed processing of CPU and Memory Bank data, and saves hot data into the LRU queue to improve CPU processing speed;
The cache here is to cache the high-frequency and small fields in the database, and the 50% hit rate is worth the cache IO overhead.
the
2. Database
i. Single table
When the query is slow, it is basically caused by too many filter conditions, and the joint index is used to speed up the filtering. The index uses a tree structure, the time complexity is about lgN, log(1 billion)=9, and it only takes 9 unit operation time to query 1 billion data. If the index cannot be used, all the data must be queried first, and then put In the memory, the memory cannot be stored, and part of it has to be stored in the disk, and finally filtered.
In addition, control the number of data items in a single query, and control the flow from the source, which is the same as the subway flow limit; in addition, there is no need to consider super large pages, and the search results of Google, baidu, and taobao do not exceed 100 pages.
the
ii. Multi-table association table query is too slow
Refer to MySQL Multi-Table Association SQL Statement Tuning for Million-level and Ten-Million-level Data for a detailed analysis of the index usage of multi-table association.
the
iii.Massive data
This business logic can only be processed with a single table, such as the user table login status table, game operation record table.
In addition, tables and databases can also be subdivided, which is more complicated, please refer to reference material a by yourself.
the
3. Data bus synchronous and asynchronous processing:
The data bus is mentioned here, because the current data processing is basically loosely coupled and driven by messages, such as kafka used by Jingdong, super flexible Rabbitmq, and metaq of Taobao:
If it is a non-core non-real-time business, such as ranking and PageView number, lastact, you can regularly drive to update the cache queue: for ranking and PageView number, summarize all the messages in the queue and update them in a unified manner; for lastact, take the latest status, Just update it;
When synchronizing real-time processing, merge the operation logic as much as possible, and update one SQL for multiple operations (the ratio of queries and updates based on the same primary key is high).
the
c. Tailoring from the demand level
A good product must be a product that makes some people scream and the other part leaves; then it is very appropriate to tailor it at the demand level to meet the needs of most people at a lower cost.
1. For the problem of large search pages, Baidu, Taobao, and Google query results are limited to 100 pages to avoid using count(1) to calculate the total number of entries;
2. Avalanche effect processing: the cache cannot hold the load Passing it to the DB will cause overload, and the service can be downgraded, and the functions or data of some users with low request frequency and low value but not low system overhead will be temporarily blocked and stopped responding to ensure the stability of the overall system.Personality; if Weibo is overloaded, unpopular subscriptions are suspended to avoid global collapse;
3. The master-slave synchronization prompts the user to delay processing, and there will be no great trouble if the experience is slightly bad: update to the master library, read from the slave library, how to avoid intermediate results.
the
To solve high concurrency, you need to have a breadth of thinking, be able to think about functions, usage, design, database, cache, OS, and its solutions, and analyze various scenarios in depth; at the same time, you must have a certain technical depth for high concurrency, such as nio , epoll, java.util.concurrent package all kinds of high-efficiency locks, which have the technical depth to solve high concurrency; but there are two important points away from the “road to gold” – how to define and track high load
the
a. Definition:
1. Composition:?CPU/memory overhead, which processes and services are occupied, the SWAP partition is large, and the IO must be low;?IO overhead, service read and write frequency;
2. Growth trend? Linear increase, exponential increase ( No index traversal), increased convergence (best support);
3. The system threshold (CPU/IO/Mem is not high but one request) request exceeds the OS threshold: if the syn-flood connection is full, https timeout is too long, resulting in https exceeding the maximum value; mysql link out of bounds;
4. The law and prediction of peaks and valleys? Cause analysis;
5. Abnormal monitoring and tracking? Ignore, and a few thousandths are going to be studied.
the
b. track
1. Data server:
1.1 cron records CPU monitoring every minute, and records when the connection exceeds the threshold of 256, without root user (root user has one more connection than ordinary users, use this link for troubleshooting when the connection is full);
1.2 binlog analysis: write the updated log, copy to the offline machine mysqldump analysis: data update request per second, table with the most update requests, SQL format of the most update requests, and a large number of repeated primary key updates in a short period of time;
1.3 Slow query log analysis, explain
2. Web server:
2.1 web log: open the execution time monitoring, analyze the execution frequency and time distribution of different dynamic scripts, and find the ones with long time and high frequency;
2.2 Do buried point analysis for programs with long time and high frequency;
2.3 SQL query output: Call the summary function to analyze the query requests per second, the most query tables and SQL, and whether there are a large number of repeated queries for the same primary key;
2.4 Error and exception log analysis, great vigilance, discovery of SQL injection speculation;
2.5 Link status monitoring: the current web link and the resources consumed, avoiding an avalanche caused by requests to call complex frameworks.
3. Memory, cache server:
3.1 Link status and resource monitoring;
3.2 Hit rate monitoring, the low hit rate is a design problem and a waste of resources.
4. General monitoring: Occupancy monitoring of memory, CPU, disk, SWAP, system resources (maximum number of open files, maximum number of file handles, number of syn connections).
the
?5. Self-recovery system: It is a particularly important solution for platforms with immature technology and rapid business development. Reliable services can be completed at a low cost, but there must be follow-up plans in the follow-up. Why is the process blocked (too many database links, too many webserver links) Many, crontab cleans up blocked).
the
?6. Monitoring system resource usage: try not to use netstat -an for high load; bury point analysis for random value extraction; locate /dev/shm to use memory instead of physical IO.
the
Attach a summary picture, graphical knowledge points, and deepen understanding. I wish you all embark on your own “road to gold”.
Every programmer should know high concurrency processing skills, start a business How the company solves high concurrency problems, ideas for solving high concurrency problems on the Internet, summary and sharing of caoz master’s years of experience (transfer)