1. Preface
In a high-traffic web system, caching is almost inseparable; but it is not easy to design a proper and efficient caching solution; so next, we will discuss what should be paid attention to in the design of application system caching, including caching The selection of types, characteristics and data indicators of common cache systems, cache object structure design and invalidation strategy, and cache object compression, etc., in order to allow students in need, especially beginners, to quickly and systematically understand relevant knowledge.
the
2. The bottleneck of the database
2.1 Data volume
The amount of data in a relational database is relatively small. Taking our commonly used MySQL as an example, the number of data entries in a single table should generally be controlled within 20 million. If the business is complex, it may be lower. Even for large commercial databases like Oracle, the amount of data it can store is difficult to satisfy a large Internet system with tens of millions or even hundreds of millions of users.
the
2.2 TPS
In actual development, we often find that the bottleneck of relational databases on TPS is often exposed more easily than other bottlenecks, especially for large-scale web systems, due to a large number of concurrent accesses every day, the requirements for database read and write performance are very high; The processing capabilities of traditional relational databases are really stretched; taking our commonly used MySQL database as an example, the TPS under normal circumstances is only about 1500 (this is another matter in various extreme scenarios); the figure below is given by the official MySQL database A test data:
For a large website with an average daily PV of tens of millions, the amount of database read and write generated by each PV may exceed several times. In this case, the amount of all data read and write requests per day may far exceed that of relational data. Processing capacity, let alone in the case of traffic peaks; so we must have efficient caching methods to withstand most of the data requests!
the
2.3 Response time
Under normal circumstances, the response time of relational data is quite good, generally within 10ms or even shorter, especially when configured properly. But as mentioned above, our requirements are unusual: when there are hundreds of millions of data and 1wTPS, the response time must be within 10ms, which is almost impossible for any relational data.
So how to solve this problem? The easiest and most effective way is of course caching!
3. Cache system selection
3.1 Types of cache
3.1.1 Local cache
Local caching may be the most commonly used caching method, whether it isnt-family:Verdana, Geneva, Arial, Helvetica, sans-serif;font-size:13px;background-color:#FFFFFF;”>
It can be seen that the work required for the serialization of an object is recursive, quite cumbersome, and requires a large amount of description information to be recorded. However, our Java native serialization not only does all the above things, but also does it well. It even added some information that the JVM needs to use when executing it “self-consciously”.
So now you can figure it out with your feet, Java native serialization has done so many things for you, can it not be slow? And he still does it so well (pedantic?), can the result not be big?
The following is basically the improvement of various tools for Java weaknesses.
the
Hessian
Hessian’s serialization implementation is very similar to Java’s native serialization, except that some metadata that is not needed for serialization and deserialization itself are deleted; so Hessian can support any type like Java’s native serialization Objects; but in terms of storage, Hessian does not optimize accordingly, so the volume of objects generated by it does not drop much compared to Java’s native serialization;
For example, Hessian still uses fixed-length storage for numeric types, but in general, frequently used data is relatively small, and most of the storage space is wasted;
In order to mark the end of the attribute section, Hessian uses the length field to indicate, which will increase the volume of the result data to a certain extent;
Since Hessian does not have much advantage over Java native serialization, in general, if Hessian’s rpc framework is not used in the system, Hessian’s serialization mechanism is rarely used alone.
the
Google Protobuf
The biggest feature of GPB is that it defines a set of its own data types, and stipulates that only mine is allowed to be used; so when using GPB, we have to define a separate description file, or schema file, for it to complete A mapping between the basic data types in Java objects and the types defined by GPB itself;
However, it is also GPB’s customization of types that allows him to better optimize the storage and analysis of these types, thus avoiding many weaknesses in Java’s native serialization.
For object attributes, GPB does not directly store the attribute name, but only saves the sequence id of the attribute according to the mapping relationship in the schema file; for several commonly used data types, GPB uses different degrees of compression. Sections are separated by specific tags, which can greatly reduce the space occupied by storage.
For numerical types, common compression methods include variable-length byte, group byte, difference storage, etc., and generally customize the compression strategy according to the usage characteristics of the attribute.
Another advantage of GPB is that it is cross-language, supporting Java, C, PHP, Python and other popular languages; other similar ones include Facebook’s Thrift, which also needs the support of description files, and also includes an rpc framework and richer language support;