Upsync: Weibo open source dynamic traffic management solution based on Nginx container-1024programmer

Editor’s note: High-availability architecture sharing and dissemination of articles with typical significance in the field of architecture, this article is shared by Yao Sifang in the high-availability architecture group. For reprinting, please indicate that it is from the high-availability architecture public account “ArchNotes”.

Yao Sifang, senior technical expert of Sina Weibo, technical director of Weibo platform architecture group. Joined Sina Weibo in 2012 and participated in several key projects such as Weibo Feed architecture upgrade, platform service transformation, and hybrid cloud. He is currently the technical leader of the Weibo platform architecture group and is responsible for the research and development of the platform’s public infrastructure. He once shared the technology of “Sina Weibo High Performance Architecture” at QCon, focusing on the direction of high-performance architecture and service middleware.

Business background and problems used by Nginx

Nginx With its ultra-high performance and stability, it has been widely used in the industry, and Nginx is widely used on the seventh floor of Weibo. Combined with the health check module of Nginx and the dynamic reload mechanism, the upgrade and expansion of the service can be almost lossless. At this time, the frequency of expansion is relatively low, and in most cases it is planned expansion.

Weibo business scenarios have very significant peak characteristics. There are both routine evening peaks and expected extreme traffic peaks such as New Year’s Day, Spring Festival Gala, and Red Envelope Flying. There are also occasional peaks caused by #周周见# #我们# and other celebrities/social events. The usual way before is buffer + downgrade. When the downgrade is not considered (it will affect the user experience), the buffer is too small and the peak value is too large to bear the cost. Therefore, since 2014, we have been trying to use containerization to realize the dynamic adjustment of the buffer, so as to realize the on-demand expansion/contraction of the buffer according to the traffic, so as to save costs.

In this scenario, there will be a large number of continuous expansion/contraction operations. There are two commonly used solutions for Nginx-based backend changes in the industry. One is DNS-based provided by Tengine, and the other is consul-template-based backend service discovery. The following table briefly compares the characteristics of the two schemes.

Based on DNS: This module is developed by the Tengine team, which can dynamically resolve domain names under upstream conf. This method is easy to operate, just modify the list of servers mounted under dns.

shortcoming :

DNS periodically polls for resolution (30s). If the configured time is too short, such as 1s, it will put pressure on the DNS server. If the configured time is too long, the timeliness will be affected.
Do not hang too many servers under the DNS-based service, it will be truncated (UDP protocol), and it will also put pressure on the bandwidth.

Based on consul-template and consul: as a combination, consul is used as a db, and consul-template is deployed on the Nginx server. Consul-template regularly initiates a request to consul. If the value changes, it will update the local Nginx related configuration files and initiate reload command. However, in the case of heavy traffic, initiating reload will affect performance. At the same time, reload will trigger the creation of a new work process. For a period of time, the old and new work processes will exist at the same time, and the old work process will frequently traverse the connection list to check whether the request has been processed. If it is over, it will exit the process; another reload It will also cause the long connection between Nginx and client and backend to be closed, and a new work process needs to create a new connection.

Performance impact caused by reload:

<img src="https://www.php1.cn/detail/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyBpVFh0WE1MOmNvbS5hZG9iZS54bXAAAA AAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFk b2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJ kZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0ia HR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVh dG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBXaW5kb3dzIiB4bXBNTTpJbnN0YW5jZUlEPSJ4bXAuaWlkOkJDQzA1MTVGNkE2MjExRTRBRjEzODVCM0Q0NEVFMjFBIiB4bXBNTTpEb 2N1bWVudElEPSJ4bXAuZGlkOkJDQzA1MTYwNkE2MjExRTRBRjEzODVCM0Q0NEVFMjFBIj4gPHhtcE1NOkRlcml2ZWRGcm9tIHN0UmVmOmluc3RhbmNlSUQ9InhtcC5paWQ6QkNDMDUxNUQ2QTYyMT FFNEFGMTM4NUIzRDQ0RUUyMUEiIHN0UmVmOmRvY3VtZW50SUQ9InhtcC5kaWQ6QkNDMDUxNUU2QTYyMTFFNEFGMTM4NUI, the impact on performance is limited and negligible.

Applications

The module has been applied in various businesses of Weibo. The chart below compares and analyzes the QPS and time-consuming changes before and after using the module.

It can be concluded from the data that the reload operation causes the request processing capacity of nginx to drop by about 10%, and the time consumption of nginx itself will increase by 50%+. If the capacity is expanded frequently, the overhead caused by the reload operation will be more obvious.

During the New Year’s Day period in 2016, hundreds of times of expansion/reduction were carried out according to the traffic characteristics of different time periods, and the SLA of the overall service during the expansion process was not affected.

The official commercial version supports DNS and push versions of Nginx plus.

Due to data consistency and other issues during use, the extension supports the consul-based pull version

https://github.com/weibocom/nginx-upsync-module is currently improving the wiki and documentation. Click to read the original text to enter.

Q & A

1. Is the registration of machine configuration information in consul automatically adjusted by the Weibo system according to the traffic?

These are two issues. The process of registering backend information with nodes during capacity expansion is automatic and has been integrated into the online system. In addition, Weibo is currently developing and evaluating an online capacity evaluation system, which is currently dealing with semi-automatic adjustments.

2. May I ask why you didn’t consider zk at the beginning? If you switch to zk and don’t use rotation training pull to change to long-term, will there be any difference?

At present, the module is already adding support similar to etcd and zk. Consul was used at the beginning because there were already consul clusters and operation and maintenance personnel in the company. zk is essentially the same as etcd and consul for modules

3. Why not use the Nginx master to pull and then distribute to each work, but use the work process to pull? The former can reduce network interaction and improve the consistency of multiple jobs inside an Nginx

If you use the master to pull, you need to modify the core module. When designing the module, a big principle is to try to ensure that the module has zero dependencies. Overall, it’s a trade off, too.

4. Is the registration of machine configuration information in consul automatically adjusted by the Weibo system according to the traffic?

This question is similar to question 1.

5. Based on what considerations did you choose consul for configuration management?

Upsync: Weibo open source dynamic traffic management solution based on Nginx container

Business background and problems used by Nginx

Applications

Q & A

author: admin

Leave a Reply Cancel reply

Contact us

Scan wechat and follow us

Business background and problems used by Nginx

Applications

Q & A

给这篇文章的作者打赏

author: admin

Related recommendations

Leave a Reply Cancel reply

Contact us

Scan wechat and follow us