With the development of time, the current virtual machine technology is becoming more and more mature. In some cases, the performance of intensive computing of virtual machines such as Java and .Net is similar to that of C++. In some cases, it is even better. excellent. This article analyzes several performance test cases in detail and explores the reasons behind the phenomenon.
Let’s look at two simple test cases. As shown in the figure below, they all loop 5000 times, operate the continuous memory of len = 1000000, and calculate the execution time. The left side is test1 and the right side is test2.
Similar programs were tested under .net core 3.0 Preview6.
The test results are compared as follows:
We can see that for test1, the C++ version is much faster. For test2, the performance of the C# version and the C++ version are equivalent, or even slightly faster.
Why does this happen? Let’s analyze it in detail:
The assignment of the loop of test1 is position-independent, so the compiler can optimize it through parallel computing instructions such as SIMD. The assignment of the loop of test2 is position-related, and it is difficult for the compiler to optimize using parallel computing instructions such as SIMD. From the above results, we can guess that the VC compiler has performed parallel optimization on test1, but .net core 3.0 preview6 has not performed parallel optimization on test1.
Let’s verify this guess. .net core 3.0 provides support for SIMD instructions. Let’s manually optimize test1 in parallel to test performance:
The result is 0.633s, which is close to the C++ version of 0.441s. Compared with 2.289s before optimization, the speed is increased by more than 3 times.
I tested the same program using java 8 and the result was surprising:
test1 takes 0.654s, which is similar to .net core after parallel optimization. It can be seen that the jvm virtual machine has been optimized in parallel. test2 takes 1.755s, which is faster than the C++ version and the .net core version, and the gap is huge!
Obviously, jvm has taken special care of the situation of test2. To understand this phenomenon, you need to have an in-depth understanding of the mechanism of the Java virtual machine. There are two JIT compilers built into the HotSpot virtual machine: Client Compiler and Server Compiler, referred to as C1 compiler and C2 compiler. The C1 compiler compiles bytecode into native code, performs simple and reliable optimizations, and adds performance monitoring logic if necessary. C2 compilation will enable some optimizations that take a long time to compile, and even perform some radical optimizations.
Searching the literature shows that by default, when the number of method calls + the number of loopbacks exceeds 10,000, the counter is a simple type such as int, and the step increment is a constant, C2 compilation optimization will be triggered. test2 exactly satisfies these three situations!
Let’s design another experiment, change the step increment to a variable, and see the test results:
It can be seen from the test that after changing the step increment to a variable, the test result is 6.163 seconds, which is similar to the C++ and .net core test results.
For this test case, it can be guessed that loop expansion was performed during C2 optimization. Next, we manually expand the loop under .net core to test performance and verify our conjecture:
The test result is 1.983s, which is close to 1.755s of java8. The conjecture was confirmed.
—-
Summary: With the development of virtual machine technologies such as JVM and .Net, language features have less and less impact on high-performance computing performance. The understanding of computer architecture, compilation principles, and virtual machine compilation mechanisms will have an impact on performance. become more important. JVM’s automatic optimization is very powerful. .net core still has a big gap in this regard. However, .net core can make up for this gap through manual optimization.