class=”markdown_views prism-atom-one-light”>
In-depth understanding of floating-point number implementation principles, number range and precision, and the problem of large numbers eating small numbers
After reading this article, you will have a deep understanding of the principles of floating-point number implementation: normalized representation, non-normalized representation, +/- 0, INF, NAN, floating-point number representation range and precision. And will also figure out the principle of big numbers eating small numbers and the corresponding solutions.
A few days ago, a colleague and I discussed the precision of floating-point numbers and the problem of large numbers eating decimals. I just took this opportunity to write this article on the implementation principle of floating-point numbers and the problem of large numbers eating decimals, which is also convenient for later people. See similar questions.
Floating point number implementation principle
Here we take IEEE-754 single-precision floating-point numbers as an example, and double-precision floating-point numbers are similar.
Single-precision floating-point numbers are stored in computers as 32 bits. These 32 bits are divided into three parts: sign bit (Sign), exponent bit (Exponent) and mantissa bit (Mantissa, Fraction):
- Sign bit: Indicates the sign of the value, that is, positive or negative. Occupies 1 bit.
- Exponent bit: Indicates the exponent part of the value. Occupies 8 bits.
- Mantissa digit: Indicates the base part of the value. Occupies 23 bits.
V = ( − 1 ) S ∗ M ∗ 2 E V = (-1)^S * M * 2^E V=(−1)S∗M∗∗2 E
Normalized representation
Final part:
Normalization (Normal) means that the decimal part must be normalized first, that is, expressed in the form of 1.xx * 2 ^ exp. For example:
0.09375 has a binary representation of 0.00011 and a normalized representation of 1.1 * 2 ^ (-4).
Here you can also see why the normalized representation is needed: the mantissa is fixed at 23 bits, and if there are many leading zeros in the mantissa, it will occupy many effective bits. Using normalized representation can remove these leading zeros and make the effective bits in disguise More, it also improves the representation precision of floating-point numbers.
In addition, a characteristic of normalized representation is that the first digit of the effective number is 1, and this 1 can also be omitted.
return 0;
}
The output result is: 30000000.0000000000
Scheme 2: Segmented sum
The essence of the piecewise summation idea is to avoid the direct addition of larger numbers and smaller numbers.
# include
int main() {
double sum = 0.0;
double x = 1.0;
for (int i = 0; i 10000000; i++) {
sum += x;
}
for (int i = 0; i 10000000; i++) {
sum += x;
}
for (int i = 0; i 10000000; i++) {
sum += x;
}
printf("%1.10f\n", sum);
return 0;
}
The output result is: 30000000.0000000000
Scheme Three: Kahan Summation
The idea of Kahan’s summation algorithm: save the significant digits lost by the smaller number during the addition of the larger number and the smaller number, and then compensate it. Kahan summation compensation is introduced in more detail in Wikipedia, and students who want to know can go to here.
# include
int main() {
double sum = 0.0;
double x = 1.0;
float eps = 0.0;
float t = 0.0;
for (int i = 0; i 30000000; i++) {
float y = x - eps;
t = sum + y;
eps = t - sum -y;
sum = t;
}
printf("%1.10f\n", sum);
return 0;
}
The output result is: 30000000.0000000000
At this point, I think it should be clear about the implementation principle of floating point numbers and the problem of large numbers eating small numbers.
Reference
- [1] In-depth understanding of computer systems
- [2] https://en.wikipedia.org/wiki/Kahan_summation_algorithm
en keyword”>for (int i = span> 0; i 30000000; i++) {
float y = x – eps;
t = sum + y;
eps = t – sum –y;
sum = t;
}
printf(“%1.10f\n”, sum);
return 0;
}
The output result is: 30000000.0000000000
At this point, I think it should be clear about the implementation principle of floating point numbers and the problem of large numbers eating small numbers.
Reference
- [1] In-depth understanding of computer systems
- [2] https://en.wikipedia.org/wiki/Kahan_summation_algorithm