Rust version: N/A lexical version: 0.8.1 lexical compilation f

This line is not correct: <div class="Box-

This line is not correct: <div class="Box Box--cond

[BUG] Fix a bug, and improve comments on Dragonbox,about alexhuszagh/rust-lexical

jk-jeon commented on May 21, 2024 1

This line is not correct:

rust-lexical/lexical-write-float/src/algorithm.rs

Line 683 in 8cd8e86

if shorter && F::compute_mul_parity(two_fc * 2 - 1, &pow5, beta_minus_1 - 1) {

Please see: jk-jeon/dragonbox#22

By the way, I'm a bit confused of what you wrote in the doc/comments about the rounding mode. Could you elaborate on, for example, this?

    // According to the main Dragonbox algorithm, this is the left-closed
    // directed rounding case, however, that is the case of I = [w,w+), which
    // does not round-down. Rather, we want the right-closed directed rounding
    // case, or I = (w−,w]. Unfortunately, this doesn't work, since it truncates
    // **too** much, so we just round-nearest, tie-even, then direct the rounding.

from rust-lexical.

Alexhuszagh commented on May 21, 2024 1

Here are some comments in the code I find misleading to the readers.

1. https://github.com/Alexhuszagh/rust-lexical/blob/8cd8e86ac031369cc2181d595e4958ad7ab5cd8a/lexical-write-float/src/algorithm.rs#L826
    I'm not sure what you mean here. The ones provided by dragonbox (if you mean the reference implementation) eventually boil down to code mostly identical to what you wrote. There are small differences on the details but they should not really make any differences on the performance.

Ok this is actually misleading on my part. I ported everything manually by hand, and thought I went back and re-documented everything properly and checked to ensure my comments during initial porting were correct. This is clearly misleading after checking the resulting assembly, however.

2. https://github.com/Alexhuszagh/rust-lexical/blob/8cd8e86ac031369cc2181d595e4958ad7ab5cd8a/lexical-write-float/src/algorithm.rs#L861
    This one is not correct. What's being computed is `floor(x * log10(2) - log10(4/3))`; please refer to the [paper](https://github.com/jk-jeon/dragonbox/blob/master/other_files/Dragonbox.pdf) (Section 5.4).

Ah, I ported this by hand (with a bit of guidance from the Rust auto-generated port) and may have been confused. Thanks for the correction.

3. https://github.com/Alexhuszagh/rust-lexical/blob/8cd8e86ac031369cc2181d595e4958ad7ab5cd8a/lexical-write-float/src/algorithm.rs#L355
    and similar lines. There should be no `floor` on the LHS's (the RHS's are not integers). This was my mistake and I corrected them in my repo recently.

Thank you.

from rust-lexical.

jk-jeon commented on May 21, 2024 1

I'm afraid if my explanation on the "decimal-to-binary rounding policy" confused you. Let me add some more explanation to help your understanding, if that's the case. Note that what the decimal-to-binary rounding mode policy determines is the assumption on what a correct parser will do when it has to round, not what Dragonbox will do when it has to round. In other words, it dictates the search interval where Dragonbox finds the shortest decimal number from. How Dragonbox rounds its own results is determined by the "binary-to-decimal rounding policy", on the other hand. And that policy has no effect on compute_left_closed_directed and compute_right_closed_directed, because there can be no rounding tie for those cases.

If I'm correct, the Rust compiler will parse what's written in the source code according to the round-to-nearest, tie-to-even rounding mode, so seemingly contradictory results are actually due to incompatible assumptions on the rounding mode. Let's look at an example.

compute_left_closed_directed
1.23456 => (12345601, -7) for binary32

Relevant exact values of IEEE-754 binary32 format are:

1.23455989360809326171875
1.2345600128173828125
1.23456013202667236328125

Hence, the distance from the first one to 1.23456 is 0.00000010639190673828125 while that from the second one is 0.0000000128173828125, so the closest one to 1.23456 is the second one. Since the Rust compiler uses the round-to-nearest, tie-to-even rounding mode, the second one is the one 1.23456 ends up being parsed into. Now, if you invoke compute_left_closed_directed, it assumes that a parser will round-down; that is, a parser will return w for every number inside the interval [w,w+). So compute_left_closed_directed will try to find the one inside [w,w+) that has the smallest number of digits. In the above case, w=1.2345600128173828125 and w+=1.23456013202667236328125, so 1.23456 is actually not in the search interval. Instead, the one with the smallest number of digits is 1.2345601, which compute_left_closed_directed correctly found out.

On the other hand, compute_right_closed_directed will try to find the one inside (w-,w], so in this case 1.23456 is in the search interval, which is why 1.23456 is the output in this case.

If you indeed misunderstood things, then I hope this clarifies.

If you actually knew this already, then I'm sorry for misunderstanding you, but I'm still not so sure what you were trying to do, so could you elaborate more? For example, why this is the case?

 // Neither of these is an expected result when truncating digits (we expect 
 // `13.999999999999998` when truncating for the first case, and `1.23456` for 
 // the second case).

from rust-lexical.

Alexhuszagh commented on May 21, 2024 1

Ok I believe I've implemented the final corrections. Thanks once again for all your feedback:

rust-lexical/lexical-write-float/src/algorithm.rs

Lines 373 to 400 in d70229d

    
           // Toward zero case: 
        
           // 
        
           // What we need is a compute-nearest, but with truncated digits in the 
        
           // truncated case. Note that we don't need the left-closed direct 
        
           // rounding case of I = [w,w+), or right-closed directed rounding 
        
           // case of I = (w−,w], since these produce the shortest intervals for 
        
           // a **float parser** assuming the rounding of the float-parser. 
        
           // The left-directed case assumes the float parser will round-down, 
        
           // while the right-directed case assumed the float parser will round-up. 
        
           // 
        
           // A few examples of this behavior is described here: 
        
           //    **compute_nearest_normal** 
        
           // 
        
           //    - `1.23456 => (123456, -5)` for binary32. 
        
           //    - `1.23456 => (123456, -5)` for binary64. 
        
           //    - `13.9999999999999982236431606 => (13999999999999998, -15)` for binary64. 
        
           // 
        
           //     **compute_left_closed_directed** 
        
           // 
        
           //    - `1.23456 => (12345601, -7)` for binary32. 
        
           //    - `1.23456 => (12345600000000002, -16)` for binary64. 
        
           //    - `13.9999999999999982236431606 => (13999999999999999, -15)` for binary64. 
        
           // 
        
           //     **compute_right_closed_directed** 
        
           // 
        
           //    - `1.23456 => (123456, -5)` for binary32. 
        
           //    - `1.23456 => (123456, -5)` for binary64. 
        
           //    - `13.9999999999999982236431606 => (13999999999999982, -15)` for binary64.

from rust-lexical.

Alexhuszagh commented on May 21, 2024

This line is not correct:

rust-lexical/lexical-write-float/src/algorithm.rs

Line 683 in 8cd8e86

if shorter && F::compute_mul_parity(two_fc * 2 - 1, &pow5, beta_minus_1 - 1) {

Please see: jk-jeon/dragonbox#22

Thank you.

By the way, I'm a bit confused of what you wrote in the doc/comments about the rounding mode. Could you elaborate on, for example, this?

    // According to the main Dragonbox algorithm, this is the left-closed
    // directed rounding case, however, that is the case of I = [w,w+), which
    // does not round-down. Rather, we want the right-closed directed rounding
    // case, or I = (w−,w]. Unfortunately, this doesn't work, since it truncates
    // **too** much, so we just round-nearest, tie-even, then direct the rounding.

Let me re-enable this code path, run my test suite and get back to you, and the comment is slightly misleading so I'll have to update that. What I was getting is where the interval would be rounded below the desired rounding point, eg., 14.0 going to 139999999 (this is not a real example, but since I don't have one on the top of my head, this is just merely for illustration). When truncating the digits later, I would then get 13.999, which wasn't expected in the context. I'll get a working example (if it still exists, it may have been a bug during implementation) and get back to you.

The other approach, the left-closed directed rounding case, would sometimes produce significant digits above my the desired result, while the right-closed directed rounding case would produce significant digits below the desired result, which when truncated could seem decently misleading.

This might be an implementation bug on my end or fixed in the bug fixes above you provided, in which it would not reproduce.

from rust-lexical.

Alexhuszagh commented on May 21, 2024

This is going to take some time: I'm first going to pore over the changes in master to ensure that all patches to the dragonbox algorithm are fixed in lexical, and then test to ensure 3). still reproduces, and if so, then provide meaningful examples of what I mean. In the meantime, I've correct the comments above.

from rust-lexical.

Alexhuszagh commented on May 21, 2024

Ok for the first case (prior to any bug fixes), for my implementation, I was getting the following results for the various cases. The last example in each case is "{:.25f}".format(13.999999999999998).

compute_nearest_normal

1.23456 => (123456, -5) for binary32.
1.23456 => (123456, -5) for binary64.
13.9999999999999982236431606 => (13999999999999998, -15) for binary64.

compute_left_closed_directed

1.23456 => (12345601, -7) for binary32.
1.23456 => (12345600000000002, -16) for binary64.
13.9999999999999982236431606 => (13999999999999999, -15) for binary64.

compute_right_closed_directed

1.23456 => (123456, -5) for binary32.
1.23456 => (123456, -5) for binary64.
13.9999999999999982236431606 => (13999999999999982, -15) for binary64.

That is, each interval does what they advertise (left-closed does in fact round-up in the interval, or compute [w,w+), and right-closed does generally compute the shorter interval but can sometimes round-down, or compute (w−,w], but this can lead to some paradoxical results. This is likely not an issue with the algorithm, but for my specific use-case, this isn't desirable. I can update the comment to make this clearer.

from rust-lexical.

Alexhuszagh commented on May 21, 2024

Ok I've applied the bug fix and added the additional comments documenting the algorithm choices internally:

rust-lexical/lexical-write-float/src/algorithm.rs

Lines 373 to 412 in cc3ba27

    
           // Toward zero case: 
        
           // 
        
           // According to the main Dragonbox algorithm, this is the left-closed 
        
           // directed rounding case, however, that is the case of I = [w,w+), which 
        
           // does not round-down. Rather, we want the right-closed directed rounding 
        
           // case, or I = (w−,w]. Unfortunately, this doesn't work, since if it 
        
           // goes toward `w-` it can round-down which produces misleading results, 
        
           // within the context of our float formatter, so we just round-nearest, \ 
        
           // tie-even, then direct the rounding. 
        
           // 
        
           // An example of cases where this matters is: 
        
           //    **compute_nearest_normal** 
        
           // 
        
           //    - `1.23456 => (123456, -5)` for binary32. 
        
           //    - `1.23456 => (123456, -5)` for binary64. 
        
           //    - `13.9999999999999982236431606 => (13999999999999998, -15)` for binary64. 
        
           // 
        
           //     **compute_left_closed_directed** 
        
           // 
        
           //    - `1.23456 => (12345601, -7)` for binary32. 
        
           //    - `1.23456 => (12345600000000002, -16)` for binary64. 
        
           //    - `13.9999999999999982236431606 => (13999999999999999, -15)` for binary64. 
        
           // 
        
           //     **compute_right_closed_directed** 
        
           // 
        
           //    - `1.23456 => (123456, -5)` for binary32. 
        
           //    - `1.23456 => (123456, -5)` for binary64. 
        
           //    - `13.9999999999999982236431606 => (13999999999999982, -15)` for binary64. 
        
           // 
        
           // In the case of `13.9999999999999982236431606`, we get `14.0` for 
        
           // binary64 using `compute_nearest_normal` and `13.99999999999998` for 
        
           // `compute_right_closed_directed`. 
        
           // 
        
           // In the case of `1.2345600000000001017497198`, we get `1.23456` for 
        
           // binary64 using `compute_nearest_normal` `1.2345601` using 
        
           // `compute_left_closed_directed`. 
        
           // 
        
           // Neither of these is an expected result when truncating digits (we expect 
        
           // `13.999999999999998` when truncating for the first case, and `1.23456` for 
        
           // the second case).

The new comments better describe the desired behavior, and why although the algorithms work as described, for this library's specific purposes, the compute_nearest interval has the desired properties. The truncation might produce less significant digits, but a significant digit being different from the compute_nearest case isn't exactly expected behavior within this specific context.

The bug fix is here:

rust-lexical/lexical-write-float/src/algorithm.rs

Lines 709 to 718 in cc3ba27

    
               let two_f = two_fc 
        
                   - if shorter { 
        
                       1 
        
                   } else { 
        
                       2 
        
                   }; 
        
               if F::compute_mul_parity(two_f, &pow5, beta_minus_1) { 
        
                   return short_circuit(); 
        
               } 
        
           }

I believe this addresses all the above concerns. If there's still any remaining issues or misleading comments, I'd be happy to fix them.

from rust-lexical.

Alexhuszagh commented on May 21, 2024

what a correct parser will do when it has to round, not what Dragonbox will do when it has to round

Ah.... this makes a lot more sense. This was the part I was confused about. The annotations were mostly for me to understand to ensure why I wasn't using that algorithm, because re-reading the paper I had at times wondered while reading the paper a few times that a different algorithm would be more suited to my specific application. I'm originally a biology major, not a CS or math major, so sometimes it can take me a bit more time to fully comprehend things. I highlighted both cases just because they're examples of why they don't work for my specific use-case, but I was kind of confused on the exact details of why. This makes a lot more sense.

I can update my comments now that I better understand. These were mostly internally for me from a maintenance perspective for why I chose to use them. I had assumed that round_nearest was, in-fact, the correct algorithm, but was somewhat confused on the exact details of why the others didn't work (hence the notes for a maintenance perspective), and attempted a few and realized they definitely did not work. What I probably wanted in this case was nearest_toward_zero, which makes intuitive sense.

from rust-lexical.

Alexhuszagh commented on May 21, 2024

Also, this wasn't really a case of an unclear explanation, but I am extremely grateful for all the feedback and bug fixes you've provided. Rather, it was me reading the paper, then looking for the source code, checking to see if it might be applicable, trying it and realizing it might not work here (without fully comprehending it) and documented why it didn't do as expected for my particular application. But your explanation puts things into much greater clarity: thank you.

from rust-lexical.

jk-jeon commented on May 21, 2024

I'm truly glad it helped!

And again thanks a lot for turning my work into something really useful😊

from rust-lexical.

[BUG] Fix a bug, and improve comments on Dragonbox about rust-lexical HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	// Toward zero case:
	//
	// What we need is a compute-nearest, but with truncated digits in the
	// truncated case. Note that we don't need the left-closed direct
	// rounding case of I = [w,w+), or right-closed directed rounding
	// case of I = (w−,w], since these produce the shortest intervals for
	// a float parser assuming the rounding of the float-parser.
	// The left-directed case assumes the float parser will round-down,
	// while the right-directed case assumed the float parser will round-up.
	//
	// A few examples of this behavior is described here:
	// compute_nearest_normal
	//
	// - `1.23456 => (123456, -5)` for binary32.
	// - `1.23456 => (123456, -5)` for binary64.
	// - `13.9999999999999982236431606 => (13999999999999998, -15)` for binary64.
	//
	// compute_left_closed_directed
	//
	// - `1.23456 => (12345601, -7)` for binary32.
	// - `1.23456 => (12345600000000002, -16)` for binary64.
	// - `13.9999999999999982236431606 => (13999999999999999, -15)` for binary64.
	//
	// compute_right_closed_directed
	//
	// - `1.23456 => (123456, -5)` for binary32.
	// - `1.23456 => (123456, -5)` for binary64.
	// - `13.9999999999999982236431606 => (13999999999999982, -15)` for binary64.

	// Toward zero case:
	//
	// According to the main Dragonbox algorithm, this is the left-closed
	// directed rounding case, however, that is the case of I = [w,w+), which
	// does not round-down. Rather, we want the right-closed directed rounding
	// case, or I = (w−,w]. Unfortunately, this doesn't work, since if it
	// goes toward `w-` it can round-down which produces misleading results,
	// within the context of our float formatter, so we just round-nearest, \
	// tie-even, then direct the rounding.
	//
	// An example of cases where this matters is:
	// compute_nearest_normal
	//
	// - `1.23456 => (123456, -5)` for binary32.
	// - `1.23456 => (123456, -5)` for binary64.
	// - `13.9999999999999982236431606 => (13999999999999998, -15)` for binary64.
	//
	// compute_left_closed_directed
	//
	// - `1.23456 => (12345601, -7)` for binary32.
	// - `1.23456 => (12345600000000002, -16)` for binary64.
	// - `13.9999999999999982236431606 => (13999999999999999, -15)` for binary64.
	//
	// compute_right_closed_directed
	//
	// - `1.23456 => (123456, -5)` for binary32.
	// - `1.23456 => (123456, -5)` for binary64.
	// - `13.9999999999999982236431606 => (13999999999999982, -15)` for binary64.
	//
	// In the case of `13.9999999999999982236431606`, we get `14.0` for
	// binary64 using `compute_nearest_normal` and `13.99999999999998` for
	// `compute_right_closed_directed`.
	//
	// In the case of `1.2345600000000001017497198`, we get `1.23456` for
	// binary64 using `compute_nearest_normal` `1.2345601` using
	// `compute_left_closed_directed`.
	//
	// Neither of these is an expected result when truncating digits (we expect
	// `13.999999999999998` when truncating for the first case, and `1.23456` for
	// the second case).

	let two_f = two_fc
	- if shorter {
	1
	} else {
	2
	};
	if F::compute_mul_parity(two_f, &pow5, beta_minus_1) {
	return short_circuit();
	}
	}