标签:
亲爱的程序猿们,你们肯定都用过printf吧?你们知道,看起来理所当然的简单的printf,实际上是个难倒众多计算机科学家的难题吗?直到1971年,才有我们的毒师Mr.White科学家(Jon White)解决这个问题,直到1990年,我们的Mr.White才正式发表这算法的最终版本,Dragon4,
在随后到最近的几十年来,语言上的各种浮点数打印字符串都是用Dragon4算法,其他人的研究都只是对这个算法修修补补,直到Grisu算法的出现。Grisu算法由Florian Loitsch发表,64位的浮点数可以用它表示,但是有0.6%的依然要用Dragon4算法来表示。
因此,要是想做轮子,无论如何都要懂Dragon4算法!!! ————引言
原文:http://yonghaowu.github.io/Blog/print_floating_point_number/
为了修复我在wine中发现的bug--sprintf在vs和gcc下的行为不一致,gcc的printf,sprintf是四舍六入五成双的舍入(银行家舍入),而vs的则是四舍五入,开始研究起浮点数打印字符串。
算法的核心就是把小数点后面的每一个数字提取出来,我分别想到2种方法:对于double a
a - (int)a 这样把小数拿出来,乘以精度,得到的数值用round函数后,就是整数,接下来就容易了。可以参考我的实现,在精度不大的情况下没有问题。
可是,在乘以精度的时候就会有问题,因为浮点数不可以精确表示。比如2.34以35位精度表示成2.3400000...000,当乘以超过一定精度的时候,你就会发现,小数点后不是单纯的0而是其他数字了。
a- (int)a 这样把小数拿出来,用fmod(a, 0.1)取余数,可是问题是,0.789不断除0.1后,
double temp = float_val;
while(temp != 0) {
cout<<"int "<<(int)(temp*10)< temp = fmod(temp, 0.1);
cout<<"float_val "< cout<<"float_val*10= "< temp *= 10.0;
}
运行结果:
cin val
0.789
float_val 0.789
int 7
float_val 0.089
float_val*10= 0.89
1
int 8
float_val 0.09
float_val*10= 0.9
1
int 8
float_val 0.1
float_val*10= 1
1
int 9
float_val 0.1
float_val*10= 1
问题就在最后0.09变成0.9后,除0.1再取小数就变0.1了。尝试好几种方法,无法解决。
最后只可以上网查资料,发现这个问题竟然是持续几十年的难题,要是我无意中完美解决了,就像高斯一样了XD
下面,开始讲解Dragon4的背景知识。
下面的文章翻译自 http://www.ryanjuckett.com/programming/printing-floating-point-numbers/
在我们研究真正的算法之前,先一起看看一个简单的(不准确)的转换正浮点数的方法。如果浮点数可以储存无限精度,那么这个算法确实没有任何问题。但可惜的是,我们的
计算机不能精确表示浮点数,这只是一碟在我们在Dragon4前的小菜。
一个数字可以表示成一系列的数字乘以10的次幂(作为程序猿的你不知道的话请到课室外面站),举例子, 123.456 等于 1?102+2?101+3?100+4?10?1+5?10?2+4?10?3
。
constintc_maxDigits = 256; // input binary floating-point number to convert from doublevalue = 122.5; // the number to convert // output decimal representation to convert to char decimalDigits[c_maxDigits];// buffer to put the decimal representation in int numDigits = 0; // this will be set to the number of digits in the buffer int firstDigitExponent = 0; // this will be set to the the base 10 exponent of the first // digit in the buffer // Compute the first digit's exponent in the equation digit*10^exponent // (e.g. 122.5 would compute a 2 because its first digit is in the hundreds place) firstDigitExponent = (int)floor(log10(value) ); // Scale the input value such that the first digit is in the ones place // (e.g. 122.5 would become 1.225). value = value / pow(10, firstDigitExponent); // while there is a non-zero value to print and we have room in the buffer while(value > 0.0 && numDigits < c_maxDigits) { // Output the current digit. doubledigit = floor(value); decimalDigits[numDigits] = '0'+ (char)digit;// convert to an ASCII character ++numDigits; // Compute the remainder by subtracting the current digit // (e.g. 1.225 would becom 0.225) value -= digit; // Scale the next digit into the ones place. // (e.g. 0.225 would becom 2.25) value *= 10.0; }
但是为什么它不准确呢?问题主要来自循环里我们对浮点数进行了运算。当我们继续运行这个程序,就会发现有很多数字不能正确表示。
同样,这个算法缺少一些必要的特性。
type | sign bits | exponent bits | mantissa bits |
32-bit float | 1 | 8 | 23 |
64-bit double | 1 | 11 | 52 |
再一次强调,我们现在这个算法只是小白版,目前还不需要支持负数,所以我们先忽略符号位。在我们使用对数知识进行惊为天人的数学推理前(初中生题目),我们需要先理
解浮点数是怎么由那三个部分转换过来的。
有4种浮点数可以由指数和尾数表示。
exponent | mantissa | usage | |||||
非规格化 | 0 | 任何值 | 非规格化数字是一些太小而不能fit in the normalized range. 他们真正的值是这样计算的:
|
||||
规格化 | 大于0小于最大值 | 任何值 |
大部分浮点数都是规格化数字 majority of floating point values. 他们真正的值是这样计算的:
|
||||
Infinity | 最大值 (32位的浮点数就是:0xFF 64位的double就是: 0x7FF | 0 | Inf的结果来自一些不恰当的计算比如除0. | ||||
NaN | 最大值 (32位的浮点数就是:0xFF 64位的double就是: 0x7FF | 大于0 | NaN 意思是 "not a number" 出现这个结果通常是计算得到的结果不是真正的数字,比如对负数开方. |
我们的数字转换算法只是为规格化和非规格化数逐个数字取出来。就像符号位一样,对inf和NaN的处理都留给调用它的函数,我们先不处理。
无论是规格化数还是非规格化数,现在我们知道怎么求出32位和64位的浮点数的值了。
用无符号型整数valueMantissa来代表尾数,有符号整数valueExponent代表指数,输入的浮点数的值都等于 (valueMantissa?2valueExponent).
representation | conversion | valueMantissa | valueExponent |
32-bit 非规格化数 | The floating point equation is Factor a |
||
32-bit 规格化数 | The floating point equation is Factor a |
||
64-bit 非规格化数 | The floating point equation is Factor a |
||
64-bit 规格化数 | The floating point equation is Factor a |
现在我们可以用 (valueMantissa?2valueExponent)来求出所有浮点数的值了。现在我们用大数运算来重构我们的算法。
用整数来表示数字如12很正常,但是表示小数如12.3就不是简单的事情了。对小数部分的处理,我们要用到两个整数来表示它的值:一个除数和一个被除数。
12.3就可以用除数123和被除数10来表示了,因为123=123/10.
用这个方法,我们就可以将二进制浮点数准确的转换成数字了,但是,我们要用到非常大的数字才可以,不然无法保证精度。 32位和64位的整数都会截断,事实上,当要表示
64位的double类型时,我们要用到超过1000位的整数来处理一些值。
要处理这么大的数字,我们要创建一个tBigInt 结构体和为它写一些计算函数。假设现在有这么一个结构体,我们可以重构算法成下面这个样子了。
对了,我还要提醒你,我们的算法依然是为了让你看懂先~而不是优化后的。(接下来就有趣了)为什么?因为伟大的数学表演要开始了!!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
const int c_maxDigits
= 256; //
input number to convert from: value = valueMantissa * 2^valueExponent double value
= 122.5; //
in memory 0x405EA00000000000 //
(sign=0x0, exponent=0x405, mantissa=0xEA00000000000) uint64_t
valueMantissa = 8620171161763840ull; int32_t
valueExponent = -46; //
output decimal representation to convert to char decimalDigits[c_maxDigits]; //
buffer to put the decimal representation in int numDigits
= 0; //
this will be set to the number of digits in the buffer int firstDigitExponent
= 0; //
this will be set to the the base 10 exponent of the first //
digit in the buffer //
Compute the first digit‘s exponent in the equation digit*10^exponent //
(e.g. 122.5 would compute a 2 because its first digit is in the hundreds place)
firstDigitExponent
= ( int ) floor ( log10 (value)
); //
We represent value as (valueNumerator / valueDenominator) tBigInt
valueNumerator; tBigInt
valueDenominator; if (exponent
> 0) { //
value = (mantissa * 2^exponent) / (1) valueNumerator
= BigInt_ShiftLeft(valueMantissa, valueExponent); valueDenominator
= 1; } else { //
value = (mantissa) / (2^(-exponent)) valueNumerator
= mantissa; valueDenominator
= BigInt_ShiftLeft(1, -valueExponent); } //
Scale the input value such that the first digit is in the ones place //
(e.g. 122.5 would become 1.225). //
value = value / pow(10, firstDigitExponent) if (digitExponent
> 0) { //
The exponent is positive creating a division so we multiply up the denominator. valueDenominator
= BigInt_Multiply( valueDenominator, BigInt_Pow(10,digitExponent) ); } else if (digitExponent
< 0) { //
The exponent is negative creating a multiplication so we multiply up the numerator. valueNumerator
= BigInt_Multiply( valueNumerator, BigInt_Pow(10,digitExponent) ); } //
while there is a non-zero value to print and we have room in the buffer while (valueNumerator
> 0.0 && numDigits < c_maxDigits) { //
Output the current digit. double digit
= BigInt_DivideRoundDown(valueNumerator,valueDenominator); decimalDigits[numDigits]
= ‘0‘ +
( char )digit; //
convert to an ASCII charater ++numDigits; //
Compute the remainder by subtracting the current digit //
(e.g. 1.225 would becom 0.225) valueNumerator
= BigInt_Subtract( BigInt_Multiply(digit,valueDenominator) ); //
Scale the next digit into the ones place. //
(e.g. 0.225 would becom 2.25) valueNumerator
= BigInt_Multiply(valueNumerator,10); } |
在上面的例子中,唯一剩下的浮点数计算就是用 log10()函数来计算到第一个数字的指数(为了科学计数法表示)。这是一个潜在的不准确转换的因素。
在Dragon4算法的实现上,Steele和White用一个高精度整数不断除以10来循环计算它,这是很简单而且显然而见的办法,结果不言而喻,非常慢。在接下来Burger和Dybvig的论文中,提出两种优化的办法来估算出这个值,而估计出来的最多只是跟正确值相差1
第一种办法就是依然用log10()函数,但是会加上一个小的偏移值来保证 floor() 函数可以恰当的舍入到正确值。
第二种办法就是利用伟大的数学知识来计算这个值,而我的实现就是基于第二种方法,但加了一点自己的优化。
(不好意思,CSDN的的公式很难表示,我先翻译到这里。sorry)
要计算到第一位数字的指数,我们解出这条公式:
firstDigitExponent=?log10value?
然后我们可以用 change of base formula 对数转换公式来重新用2的对数来表示
log10value=log2valuelog210
再一次使用转换公式,我们可以得到:
log210=log1010log102=1log102
带入到等式中,我们可以消去除法运算:
log10value=log2value?log102
现在要计算的等式变成:
firstDigitExponent=?log2value?log102?
还记得我们说过 value=valueMantissa?2valueExponent 吗?我们可以用它来带入上面的公式,得到:
log2value=log2(valueMantissa)+valueExponent
Because our mantissa is represented as a binary integer, the base-2 logarithm is close to the index of its highest set bit. Explicitly, the highest bit index is:
Given that
Given that
Scaling by the positive value
Because we are using floating-point numbers, we need to account for drift when we multiply by
Let‘s now consider the following estimate.
The inequality implies that our estimate will either be within an error of
We can detect and account for the incorrect estimate case after we divide our value by
When we get to writing the actual Dragon4 algorithm, we‘ll find that to simplify the number of operations, we actually compute
The final inequality is:
As it stands, our prototype algorithm will print a number until all precision is exhausted (i.e. when the remainder reaches zero). Because there are discrete deltas between each representable number, we don‘t need to print that many digits to uniquely identify a number. Thus, we are going to add support for printing the minimal number of digits needed to identify a number from its peers. For example, let‘s consider the following three adjacent 32-bit floating-point values.
binary | exact decimal representation | |
0x4123c28e | 10.2349987030029296875 | |
0x4123c28f | 10.23499965667724609375 | |
0x4123c290 | 10.2350006103515625 |
There is a margin of 0.00000095367431640625 between each of these numbers. For any value, all numbers within the exclusive range
Solving this problem is what lets the Dragon4 algorithm output "pretty" representations of floating point numbers. To do it, we can track the margin as a big integer with the same denominator as the value. Every time we advance to the next digit by multiplying the numerator by 10, we will also multiply the margin by 10. Once the next digit to print can be safely rounded up or down without leaving the bounds of the margin, we are far enough from the neighboring values to print a rounded digit and stop.
In this image, we can see how the algorithm decides to terminate on the shortest decimal representation (the chosen numbers don‘t correspond to any real world floating-point format and are just for illustration).
Unfortunately, the IEEE floating-point format adds an extra wrinkle we need to deal with. As the exponent increases, the margin between adjacent numbers also increases. This is why floating-point values get less accurate as they get farther from zero. For normalized values, every time the mantissa rolls over to zero, the exponent increases by one. When this happens, the higher margin is twice the size of the lower margin and we need to account for this special case when testing the margin bounds in code.
In understanding this better, I find it helpful to make a small floating-point format where we can examine every value. The tables below show every positive floating-point value for a 6-bit format that contains 1 sign bit, 3 exponent bits and 2 mantissa bits.
Here we can see our format‘s minimal margin of 0.0625 between each number. No smaller delta can be represented.
binary | exponent | mantissa | equation | exact value | minimal value |
0 000 00 | 0 | 0 | (0/4)*2^(-2) | 0 | 0 |
0 000 01 | 0 | 1 | (1/4)*2^(-2) | 0.0625 | 0.06 |
0 000 10 | 0 | 2 | (2/4)*2^(-2) | 0.125 | 0.1 |
0 000 11 | 0 | 3 | (3/4)*2^(-2) | 0.1875 | 0.2 |
Here we can see our format‘s lowest exponent has the minimal margin of 0.0625 between each number. Each time the exponent increases, so does our margin by a factor of two and for our largest values the margin is 2.0.
binary | exponent | mantissa | equation | exact value | minimal value |
0 001 00 | 1 | 0 | (1 + 0/4)*2^(-2) | 0.25 | 0.25 |
0 001 01 | 1 | 1 | (1 + 1/4)*2^(-2) | 0.3125 | 0.3 |
0 001 10 | 1 | 2 | (1 + 2/4)*2^(-2) | 0.375 | 0.4 |
0 001 11 | 1 | 3 | (1 + 3/4)*2^(-2) | 0.4375 | 0.44 |
0 010 00 | 2 | 0 | (1 + 0/4)*2^(-1) | 0.5 | 0.5 |
0 010 01 | 2 | 1 | (1 + 1/4)*2^(-1) | 0.625 | 0.6 |
0 010 10 | 2 | 2 | (1 + 2/4)*2^(-1) | 0.75 | 0.8 |
0 010 11 | 2 | 3 | (1 + 3/4)*2^(-1) | 0.875 | 0.9 |
0 011 00 | 3 | 0 | (1 + 0/4)*2^(0) | 1 | 1 |
0 011 01 | 3 | 1 | (1 + 1/4)*2^(0) | 1.25 | 1.2 |
0 011 10 | 3 | 2 | (1 + 2/4)*2^(0) | 1.5 | 1.5 |
0 011 11 | 3 | 3 | (1 + 3/4)*2^(0) | 1.75 | 1.8 |
0 100 00 | 4 | 0 | (1 + 0/4)*2^(1) | 2 | 2 |
0 100 01 | 4 | 1 | (1 + 1/4)*2^(1) | 2.5 | 2.5 |
0 100 10 | 4 | 2 | (1 + 2/4)*2^(1) | 3 | 3 |
0 100 11 | 4 | 3 | (1 + 3/4)*2^(1) | 3.5 | 3.5 |
0 101 00 | 5 | 0 | (1 + 0/4)*2^(2) | 4 | 4 |
0 101 01 | 5 | 1 | (1 + 1/4)*2^(2) | 5 | 5 |
0 101 10 | 5 | 2 | (1 + 2/4)*2^(2) | 6 | 6 |
0 101 11 | 5 | 3 | (1 + 3/4)*2^(2) | 7 | 7 |
0 110 00 | 6 | 0 | (1 + 0/4)*2^(3) | 8 | 8 |
0 110 01 | 6 | 1 | (1 + 1/4)*2^(3) | 10 | 10 |
0 110 10 | 6 | 2 | (1 + 2/4)*2^(3) | 12 | 11 |
0 110 11 | 6 | 3 | (1 + 3/4)*2^(3) | 14 | 12 |
binary | exponent | mantissa | value |
0 111 00 | 7 | 0 | Inf |
binary | exponent | mantissa | value |
0 111 01 | 7 | 1 | NaN |
0 111 10 | 7 | 2 | NaN |
0 111 11 | 7 | 3 | NaN |
As far as I can tell, the original Dragon4 paper by Steele and White actually computes the low margin wrong for the lowest normalized value because it divides it in half. It should actually remain the same because the transition from denormalized to normalized numbers does not change the exponent. This error is not present in the follow-up paper by Burger and Dybvig.
When deciding how to round-off the final printed digit, it is possible for the exact value to lie directly between the lower digit and the higher digit. In this case, a rounding rule needs to be decided. You could round towards zero, away from zero, towards negative infinity, etc. In my implementation, I always round to the even digit because it is similar to the rule used by IEEE floating-point operations.
In the reverse algorithm of converting decimal strings into binary floating-point, you can hit a similar case where the string representation lies exactly between to representable numbers. Once again, a rounding rule must be chosen. If you know this rule in both algorithms, there are times when you can print out a unique representation of a number with one less digit. For example, consider how the digit printing algorithm tests if half the high-margin encompasses a digit to round-up to. I tested if the remaining value was greater than one. If we knew how the reverse algorithm would handle ties, we might be able to test greater than or equal to one. David Gay‘s dtoa implementation seems to do this; therefor it should always be run in conjunction with his strtod implementation.
Personally, I prefer printing extra digits until the rounded number is inclusively contained within the margin. This way the output is guaranteed to uniquely identify the number regardless of the tie breaking algorithm chosen by a strtod parser. Unfortunately, this is only guaranteed for a proper strtod implementation. The implementation that ships with Microsoft Visual C++ 2010 Express sometimes has an off-by-one error when rounding to the closest value. For example, the input "6.439804741657803e-031" should output the 64-bit double 0x39AA1F79C0000000, but instead outputs 0x39AA1F79BFFFFFFF. I can‘t say for certain where Microsoft‘s bug comes from, but I can say that the problem of reading a decimal string is more difficult than printing a decimal string so it‘s more likely to have bugs. Hopefully I can find the time to try my hands at it in the near future.
标签:
原文地址:http://blog.csdn.net/christopherwu/article/details/43837239