1. Background
Due to various off-site factors, we cannot freely choose the ideal hardware environment, and the current hardware configuration of a single physical machine is often higher than the demand. In order to plan resources more reasonably In many cases, a server cannot “luxury” deploy only one instance, but will consider deploying TiDB or TiKV on a single machine with multiple instances. This requires building a highly available and high-performance TiDB cluster as much as possible in the existing environment. This article mainly shares the process of hybrid deployment of TiDB clusters in an actual production environment for your reference.
2. Hardware configuration
10 physical machines, each with 56C 384G 4 pieces of 2TB NVME hard disk. Monitoring, HA and other machines can use virtual machines, so they are not included in the purchase budget.
The configuration is up to standard, but due to various factors, it was originally expected that the hardware of one cluster would need to be mixed and deployed with two sets of clusters.
3. Cluster topology planning
Cluster 1
Instance | IP |
---|---|
TiDB & PD | 10.0.0.1 |
TiDB & PD | 10.0.0.2 |
PD | 10.0.0.3 |
10.0.0.4 | |
Tikv *2 | 10.0.0.5 |
Tikv *2 | 10.0.0.6 |
Tikv *2 | 10.0.0.7 |
Tikv *2 | 10.0.0.8 |
Tikv *2 | 10.0.0.9 |
Tikv *2 | 10.0.0.10 |
Cluster 2
Example | IP |
---|---|
10.0.0.1 | |
PD | 10.0.0.2 |
TiDB & PD | 10.0.0.3 |
TiDB & PD | 10.0.0.4 |
Tikv *2 | 10.0. 0.5 |
Tikv *2 | 10.0.0.6 |
Tikv *2 | 10.0.0.7 |
Tikv *2 | 10.0.0.8 |
Tikv *2 | 10.0.0.9 |
Tikv *2 | 10.0.0.10 |
If disassembled into separate clusters, their architecture should be like this
But it is actually a mixed deployment, so their architecture should be like this
![Unnamed file (10).jpg](https ://tidb-blog.oss-cn-beijing.aliyuncs.com/media/unnamed file (10)-1647272473002.jpg)
Fourth, cluster label planning
cluster 1Topology tikv configuration labels planning is:
Cluster 2 topology tikv configuration labels planning is:
Set the location-labels configuration of PD:
location_labels = ["zone","rack","host"]
5. Summary
This operation is I want to achieve high availability as much as possible while the current number of servers remains unchanged, but due to various factors such as cost, I did not choose the remote disaster recovery and multi-computer room disaster recovery solutions in the same city, so I chose this hybrid deployment solution.
Availability of HA itself:
haproxy+keepalived achieves high availability of ha.
Availability of PD server and TiDB server:
Because pd and tidb are deployed in a mixed manner, they are put together here. 10.0.0.1-10.0.0.0.4 are 2 sets of cluster tidb and pd mixed parts. As can be seen from the architecture diagram, any server downtime will only affect at most one tidb node and one pd node in a cluster . There is still one tidb node available in the same cluster, and there are two remaining copies of the pd node. Both tidb and pd meet high availability.
Availability of TiKV server:
In order to place only one copy on TiKV with similar physical location, PD can optimize according to the physical location of TiKV Scheduling to improve the availability of the TiKV cluster as much as possible. We all know that a TiKV cluster whose Raft Group number of copies is selected as 3 can tolerate a node downtime without data loss and provide services normally. If a cluster has two TiKV nodes down at the same time, the availability can be improved by making reasonable planning to increase the probability that two TiKV nodes that fail at the same time will appear in the same isolation zone. This deployment also selects 3 copies, servers 10.0.0.5 (host1) and 10.0.0.6 (host2) are in one cabinet, 10.0.0.7 (host3) and 10.0.0.8 (host4) are in one cabinet, and 10.0.0.9 (host5) It is in the same cabinet as 10.0.0.10 (host6). According to the above plan, although there are two TiKV instances in two clusters on one server, the PD knows which TiKV nodes are on the same server and which servers are on the same server. on a cabinet. When scheduling copies, PD will ensure that different copies of the same data are distributed as much as possible according to the label level, at least to ensure that any server is down and the TiKV of two sets of clusters are available. You can also set the isolation-level parameter to further strengthen the topology isolation requirements for TiKV clusters. After any cabinet fails, for example, 10.0.0.5 and 10.0.0.6 go down at the same time, since the two servers in the two clusters only store one copy, the TiDB cluster is still available.
This is the first time I post an article, I hope it can be helpful to you guys, the actual deployment is a long time ago, please forgive me if there are any imprecise or omissions.
�Examination article: https://tidb.io/blog/8f2a6d62