Floating-Point Parsing and Formatting improvements in .NET Core 3.0

Avatar

Tanner

Starting back with the .NET Core 2.1 release, we were making iterative improvements to the floating-point parsing and formatting code in .NET Core. Now, in .NET Core 3.0 Preview 3, we are nearing completion of this work and would like to share more details about these changes and some of the differences you might see in your applications.

The primary goals of this work were to ensure correctness and standards compliance with IEEE 754-2008. For those unfamiliar with the standard, it defines the underlying format, base operations, and behaviors for binary floating-point types such as System.Single (float) and System.Double (double). The majority of modern processors and programming languages support some version of this standard, so it is important to ensure it is implemented correctly. The standard does not impact the integer types, such as System.Int32 (int), nor does it impact the other floating-point types, such as System.Decimal (decimal).

Initial Work

We started with parsing changes, as part of the .NET Core 2.1 release. Initially, this was just an attempt to fix a perf difference between Windows and Unix and was done by @mazong1123 in dotnet/coreclr#12894, which implements the Dragon4 algorithm. @mazong1123 also made a follow up PR which improved perf even more by implementing the Grisu3 algorithm in dotnet/coreclr#14646. However, in reviewing the code we determined that existing infrastructure had a number of issues that prevented us from always doing the right thing and that it would require significantly more work to make correct.

Porting to C#

The first step in fixing these underlying infrastructure issues was porting the code from native to managed. We did this work in dotnet/coreclr#19999 and dotnet/coreclr#20080. The result was we made the code more portable, allowed it to be shared with the other frameworks/runtimes (such as Mono and CoreRT), made it possible to easily debug the code with the .NET Debugger, and made it available through SourceLink.

Making the parser IEEE compliant

We did some additional cleanup in dotnet/coreclr#20619 by removing various bits of duplicated code that was shared between the different parsers. Finally, we made the double and float parsing logic mostly IEEE compliant in dotnet/coreclr#20707 and this made available in the first .NET Core 3.0 Preview.

These changes fixed three primary issues:

The fixes ensured that double.Parse/float.Parse would return the same result as the C#, VB, or F# compiler for the corresponding literal value. Producing the same result as the language compilers is important for determinism of runtime and compile time expressions. Up until the change, this was not the case.

To elaborate, every floating-point value requires between 1 and X significant digits (i.e. all digits that are not leading or trailing zeros) in order to roundtrip the value (that is, in order for double.Parse(value.ToString()) to return exactly value). This is at most 17 digits for double and at most 9 digits for float. However, this only applies to strings that are first formatted from an existing floating-point value. When parsing from an arbitrary string, you may have to instead consider up to Y digits to ensure that you produce the “nearest” representable value. This is 768 digits for double and 113 digits for float. We have tests validating such strings parse correctly in RealParserTestsBase.netcoreapp.cs and dotnet/corefx#35701. More details on this can be found on Rick Regan’s Exploring Binary blog.

An example of such a string would be for double.Epsilon (which is the smallest value that is greater than zero). The shortest roundtrippable string for this value is only 5e-324, but the exact string (i.e. the string that contains all significant digits available in the underlying value) for this value is exactly 1074 digits long, which is comprised of 323 leading zeros and 751 significant digits. You then need one additional digit to ensure that the string is rounded in the correct direction (should it be exactly double.Epsilon or the smallest value that is greater than double.Epsilon).

Some additional minor cleanup was done in dotnet/coreclr#21036 to ensure that the remaining compliance issues were resolved. These ended up mostly about ensuring we handle Infinityand NaN case-insensitively and that we allowed an optional preceding sign.

Making the formatter IEEE 754-2008 compliant

The formatting code required more significant changes and was primarily done in dotnet/coreclr#22040 with some followup work fixing some remaining issues in dotnet/coreclr#22522.

These changes fixed 5 primary issues:

These changes are expected to have the largest potential impact to existing code.

The summary of these changes is that (for double/float):

  • ToString(), ToString("G"), and ToString("R") will now return the shortest roundtrippable string. This ensures that users end up with something that just works by default. An example of where it was problematic was Math.PI.ToString() where the string that was previously being returned (for ToString() and ToString("G")) was 3.14159265358979; instead, it should have returned 3.1415926535897931. The previous result, when parsed, returned a value which was internally off by 7 ULP (units in last place) from the actual value of Math.PI. This meant that it was very easy for users to get into a scenario where they would accidentally lose some precision on a floating-point value when the needed to serialize/deserialize it.
  • For the "G" format specifier that takes a precision (e.g. G3), the precision specifier is now always respected. For double with precisions less than 15 (inclusive) and for float with precisions less than 6 (inclusive) this means you get the same string as before. For precisions greater than that, you will get up to that many significant digits, provided those digits are available (i.e. (1.0).ToString("G17") will still return 1 since the exact string only has one significant digit; but Math.PI.ToString("G20") will now return 3.141592653589793116, since the exact string contains at least 20 significant digits).
  • For the "C", "E", "F", "N", and "P" format specifiers the changes are similar. The difference is that these format specifiers treat the precision as the number of digits after the decimal point, in contrast to "G" which treats it as the number of significant digits. The previous implementation had a bug where, for strings that contained more than 15 significant digits, it would actually fill in the remaining digits with zero, regardless of whether they appeared before or after the decimal point. As an example, (1844674407370955.25).ToString("F4")would previously return 1844674407370960.0000. The exact string, however, actually contains enough information to fill all the integral digits. With the changes made we instead fill out the available integral digits while still respecting the request for the 4 digits after the decimal point and instead return 1844674407370955.2500.
  • For custom format strings, they have the same behavior as before and will only print up to 15 significant digits, regardless of how many are requested. Fixing this to support an arbitrary number of digits would require more work to support and hasn’t been done at this time.

Potential impact to existing code

When picking up .NET Core 3.0, it is expected that you may encounter some of the differences described in this post in your application or library code. The general recommendation is that the code be updated to handle these changes. However, this may not be possible in all cases and a workaround may be required. Focused testing for floating-point specific code is recommended.

For differences in parsing, there is no mechanism to fallback to the old behavior. There were already differences across various operating systems (i.e. Linux, Windows, macOS, etc) and architectures (i.e. x86, x64, ARM, ARM64, etc). The new logic makes all of these consistent and ensures that the result returned is consistent with the corresponding language literal.

For differences in formatting, you can get the equivalent behavior by:

  • For ToString() and ToString("G") you can use G15 as the format specifier as this is what the previous logic would do internally.
  • For ToString("R"), there is no mechanism to fallback to the old behavior. The previous behavior would first try “G15” and then using the internal buffer would see if it roundtrips; if that failed, it would instead return “G17”.
  • For the "G" format-specifier that takes a precision, you can force precisions greater than 15 (exclusive) to be exactly 17. For example, if your code is doing ToString("G20") you can instead change this to ToString("G17").
  • For the remaining format-specifiers that take a precision ("C", "E", "F", "N", and "P"), there is no mechanism to fallback to the old behavior. The previous behavior would clamp precisions greater than 14 (exclusive) to be 17 for "E" and 15 for the others. However, this only impacted the significant digits that would be displayed, the remaining digits (even if available) would be filled in as zero.
Avatar
Tanner Gooding

Software Engineer, .NET Team

Follow Tanner   

21 comments

  • Avatar
    Nicolas Musset

    Interesting article.
    Just a quick question. How can the following quote from the article be correct: “Math.PI.ToString("G20") will now return 3.141592653589793116“. I thought double were supposed to have 17 significant digits. That should be 3.1415926535897932xx (don’t know how the last two would be rounded up), right?

    • Avatar
      Tanner Gooding

      Once you have a floating-point value, you need to produce string that contains at most 17 significant digits for double and  9 significant digits for float to ensure the value roundtrips (assuming your parser and formatter are correct). The exact number of digits required to roundtrip a given floating-point value depends on the value. For example, Math.PI needs all 17 digits, but 1 only needs a single digit.

       

      However, even though these are the upper bounds for roundtripping strings, the true value represented by the underlying floating-point value may contain more digits. For example, the exact string for Math.PI contains 49 digits: “3.141592653589793115997963468544185161590576171875” and the exact string for double.Epsilon has 751 significant digits.

      • Avatar
        Nicolas Musset

        I understand that. But the value is just incorrect. Pi is supposed to be equal to (if you want to use 49 digits) 3.14159265358979323846264338327950388419716939937, not the one you posted.
        And if 17 digits are supposed to be significant then the value is just wrong. It should at least end with 89793238 not with 89793116 (which comes from nowhere). Hence my question.

        • Avatar
          Tanner Gooding

          float and double are binary-based floating-point numbers and aren’t actually able to represent any value that has 17 digits exactly, instead they get an approximation that is “close enough” for most purposes.

           

          Under the hood, binary-based floating point numbers are constructed out of a sign, exponent, and significand. You can then compute the actual underlying value using the algorithm: -1^sign * 2^exponent * significand. This blogpost by Fabien Sanglard actually does a good job of explaining this in terms of the exponent being a window and the significand being an offset into that window: http://fabiensanglard.net/floating_point_visually_explained/

           

          Each “window” covers the next power of 2 (since these are binary-based), so one window is [0.5,1], the next window is [1,2], then [2,4], etc. Each window is then evenly divided by the number of available offsets. This means that values that fall into the [1,2] window are more precise then values that fall into the [2,4] window.

           

          PI falls into the [2,4] window and since double has a 52-bit significand you have 2^52 (4,503,599,627,370,496) evenly space values that have a delta of 4.4408920985006261616945266723633e-16. For float, you have a 23-bit signficand, so you have 2^23 (8,388,608) evenly spaced values that have a delta of 2.384185791015625e-7.

           

          Given the above, you would find that PI is not exactly representable and the value chosen is the closest representable value.

          * Wikipedia says PI to 50 digits is: 3.14159265358979323846264338327950288419716939937510

          * We say the closest representable value is: 3.141592653589793115997963468544185161590576171875

            * The raw bit representation is: 0x400921FB54442D18

            * This is 1.224646799147353177226065932275001 × 10^-16 less than 50 digit PI

          * The next highest representable value is: 3.141592653589793560087173318606801331043243408203125

            * The raw bit representation is: 0x400921FB54442D19

            * This is 3.21624529935327298446846074008828025 × 10^-16 greater than 50 digit PI

          * The next lowest representable value is: 3.141592653589792671908753618481568992137908935546875

            * The raw bit representation is: 0x400921FB54442D17

            * This is 5.66553889764797933892059260463828225 × 10^-16 less than 50 digit PI

           

  • Avatar
    Nicholas Petersen

    Amazing work! Would this not be a prime case example of something we could **never** have done, not even *dreamed* of doing(!), with the full-framework, because of that last paragraph: Backwards compat?

    • Avatar
      Tanner Gooding

      I wouldn’t say never. There are always interesting (but generally less ideal) ways of exposing “betterness” without regressing performance. Those alternative mechanisms also come at a cost, however, and generally don’t make the end product as cohesive as you end up having to maintain two separate code paths and either making it not the default case or having some “quirk” mode to allow users to fallback to the old behavior.

       

      But yes, backwards compatibility has always been and still is a high priority for full-framework, which makes changes like this much less likely to happen.

  • Pavel Šavara
    Pavel Šavara

    I see that this post is about FP formating. But I would like to ask about calculations. I understand that CLR is always using underlying HW like FPU to calculate floating point numbers. On x64 the calculations would be done on 80bit registers but it could be 64bit registers on other hardware. That makes the calculations not repeatable, not HW independent. Such things sometimes matter. Java has Strictfp feature, which forces JVM to use only 64bits. https://en.wikipedia.org/wiki/Strictfp. Do you have any such plans too ? Here is relevant article from the past https://blogs.msdn.microsoft.com/davidnotario/2005/08/08/clr-and-floating-point-some-answers-to-common-questions/

    • Avatar
      Tanner Gooding

      RyuJIT (which is the current JIT used by CoreCLR for both x86 and x64) uses the SIMD instructions that operate directly on 32-bits (floats) and 64-bits (doubles), so this should no longer be a concern.

       

      The legacy JIT (which is the JIT used by the x86 full-framework runtime) uses the x87 FPU instructions that operate on 80-bits (extended-doubles). However, just because the underlying registers are 80-bits does not mean that the instructions always operate on 80-bits of data. Many of the x87 FPU instructions actually have seperate encodings for loading or storing any of 32-bits, 64-bits, or 80-bits. There is also a global FPU “control word” that allows you to set the rounding mode to ensure that operations are correctly rounded to 32-bits, 64-bits, or 80-bits (depending on what your current needs are). So, it is entirely possible to ensure that you get the same result even if you are reliant on the 80-bit FPU stack.

      • Kuan Bartel
        Kuan Bartel

        In out application this is currently a big issue. We need to use x64 due to large memory requirements however using x64 we are not getting deterministic results on different hardware (different CPU generations). In fact, in our testing using x86 does give us consistent results, but we can only use this for some small tests and can’t run the larger simulations.
        We are at the stage where we are wondering if we need to consider rewriting in C++ due to the lack of control of floating point in C# / .Net.

        • Avatar
          Tanner Gooding

          Could you please provide some more details about what versions you are testing with (these changes are only available in .NET Core 3.0 Preview 3, which was released today) and the values (either floating-point that you are formatting, or strings you are parsing) that are showing non-determinstic behavior?

          * To get the raw-bits for a floating-point value, you can use `System.BitConverter.DoubleToInt64Bits` or `System.BitConverter.SingleToInt32Bits`.

           

          The algorithms used for parsing and formatting is now the same in .NET Core 3.0 for both x86 and x64, this was not necessarily the case in past previews or in past releases.

          • Kuan Bartel
            Kuan Bartel

            So finally worked out what it is which is causing the result inconsistencies. Math.Pow()! For example on two machines we get the following results for Math.Pow(x, y) (approximate decmial values output using G30 format):
            x  = 0x3FE3750B37D67D7C (0.608037575777344851957195714931)
            y = 0x4000000000000000 (2.0)
            Machine1: 0x3FD7A952D8B5E87B (0.369709693557190355317487728826)
            Machine2: 0x3FD7A952D8B5E87C (0.369709693557190410828638960083)
            So just one bit difference…   but it makes a difference when you do multiple operations and differences can quickly compound.

          • Kuan Bartel
            Kuan Bartel

            Obviously this difference is not in .Net Core though. It is from the x64 implementation of pow in the CRT.

      • Kuan Bartel
        Kuan Bartel

        I tested .Net Framework 4.7 in x86 and x64 and .Net Core 3 in x86 and x64 and .Net Core 3 x64 compiled to native. In all case x86 gave consistent results while x64 did not.
         

    • Avatar
      Tanner Gooding

      No. This stack-overflow post does a good job of explaining it: https://stackoverflow.com/questions/7524838/fixed-point-vs-floating-point-number

      A fixed-point number has its decimal point “fixed” in-place. That is, it only has metadata for a sign and significand and might only ever support 2 digits after the decimal point and all other bits are used for the integral part.

      Where-as a floating-point number (like `decimal`, `float`, and `double`) allows its decimal point to “float” about. That is, it has metadata for the sign and significand, but also has metadata for something like an exponent, which says where the decimal point should fall in the signficand.

  • Avatar
    John Alexiou

    Of course, there were cases with the previous default behavior resulted in nice compact results for expressions like `606.79002` and the new behavior the result would be `606.79001999999991` due to roundoff errors in calculations. And since a lot of unit testing is done on formatted outputs there is going to be grief over implementing significant figure rounding before the conversion to string.
    In many cases `.ToString(“g4”)` is not adequate because the result is an exponential instead of trailing zeros. For example, `x=606790.02` and the test `Assert.AreEqual(“606800”, x.ToString(“g4”))` fails because the value converted into a string with 4 significant digits is `”6.068E+05″` instead of `”606800″` which is correct and more compact.
     

    • Avatar
      Tanner Gooding

      Yes, there are certainly cases where the previous result of double.ToString() was “nicer”. There are likewise cases where the previous result returned was “not nicer”. The important thing, however, is that double.ToString() now returns a string that is “correct” by default.

      For example, Math.PI (which is arguably a fairly important value) used to only return 15 digits (“3.14159265358979”). This 15 digit string, when parsed, would return a value that was not equal to Math.PI. This meant that users, by default, could easily end up with subtle bugs due to serialization/deserialization dropping small bits of precision from their values.

      • Having tested the entire range of finite float values, only 32% of float.Parse(floatValue.ToString()) calls (prior to these changes) succeeded in returning floatValue. Testing a similar range of double values showed only 8% of doubles would succeed in returning doubleValue.

      In all cases, double.ToString() will now return the “shortest roundtrippable” string. In the example you gave, 606.79002 and 606.79001999999991 are different numbers, and hence format to different values. The former has an internal bit representation of 0x4082F651F601797D and will return exactly 606.79002 when formatted (both prior to these changes and after them). The latter, however, has an internal bit representation of 0x4082F651F601797C (one less than the former) and would previously print 606.79002 (making it impossible to distinguish these two unique values by default) and will now print 606.7900199999999.

       

      *Edit:* Fixing formatting

      • Avatar
        Tanner Gooding

        To elaborate just a bit more, I checked 32 values starting at raw bits: 0x4082F651F6017970

        * Raw: 0x4082F651F6017970  Old: 606.790019999999  New: 606.7900199999985
        * Raw: 0x4082F651F6017971  Old: 606.790019999999  New: 606.7900199999987
        * Raw: 0x4082F651F6017972  Old: 606.790019999999  New: 606.7900199999988
        * Raw: 0x4082F651F6017973  Old: 606.790019999999  New: 606.7900199999989
        * Raw: 0x4082F651F6017974  Old: 606.790019999999  New: 606.790019999999
        * Raw: 0x4082F651F6017975  Old: 606.790019999999  New: 606.7900199999991
        * Raw: 0x4082F651F6017976  Old: 606.790019999999  New: 606.7900199999992
        * Raw: 0x4082F651F6017977  Old: 606.790019999999  New: 606.7900199999993
        * Raw: 0x4082F651F6017978  Old: 606.790019999999  New: 606.7900199999995
        * Raw: 0x4082F651F6017979  Old: 606.79002         New: 606.7900199999996
        * Raw: 0x4082F651F601797A  Old: 606.79002         New: 606.7900199999997
        * Raw: 0x4082F651F601797B  Old: 606.79002         New: 606.7900199999998
        * Raw: 0x4082F651F601797C  Old: 606.79002         New: 606.7900199999999
        * Raw: 0x4082F651F601797D  Old: 606.79002         New: 606.79002
        * Raw: 0x4082F651F601797E  Old: 606.79002         New: 606.7900200000001
        * Raw: 0x4082F651F601797F  Old: 606.79002         New: 606.7900200000003
        * Raw: 0x4082F651F6017980  Old: 606.79002         New: 606.7900200000004
        * Raw: 0x4082F651F6017981  Old: 606.79002         New: 606.7900200000005
        * Raw: 0x4082F651F6017982  Old: 606.790020000001  New: 606.7900200000006
        * Raw: 0x4082F651F6017983  Old: 606.790020000001  New: 606.7900200000007
        * Raw: 0x4082F651F6017984  Old: 606.790020000001  New: 606.7900200000008
        * Raw: 0x4082F651F6017985  Old: 606.790020000001  New: 606.7900200000009
        * Raw: 0x4082F651F6017986  Old: 606.790020000001  New: 606.790020000001
        * Raw: 0x4082F651F6017987  Old: 606.790020000001  New: 606.7900200000012
        * Raw: 0x4082F651F6017988  Old: 606.790020000001  New: 606.7900200000013
        * Raw: 0x4082F651F6017989  Old: 606.790020000001  New: 606.7900200000014
        * Raw: 0x4082F651F601798A  Old: 606.790020000002  New: 606.7900200000015
        * Raw: 0x4082F651F601798B  Old: 606.790020000002  New: 606.7900200000016
        * Raw: 0x4082F651F601798C  Old: 606.790020000002  New: 606.7900200000017
        * Raw: 0x4082F651F601798D  Old: 606.790020000002  New: 606.7900200000018
        * Raw: 0x4082F651F601798E  Old: 606.790020000002  New: 606.790020000002
        * Raw: 0x4082F651F601798F  Old: 606.790020000002  New: 606.7900200000021
        

        As you can see, the previous algorithm would make many of these values indistinguishable from one another, but the new one ensures that each one is properly unique. These will definitely be impactful to some users, but there should always be a possible format specifier for any given specific scenario. For example, if you don’t want the exponential format, you could use a custom format specifier or “F4” and if you want something that is consistent with the previous ToString() call you can use “G15”.

        You can read more about the available standard format strings here: https://docs.microsoft. com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

        You can read more about the available custom format strings here: https://docs.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings

        I listed several workarounds for common scenarios at the end of the blog post as well.

        *Edit:* Fixing formatting

Leave a comment