Comments (11)
This line is not correct:
Please see: jk-jeon/dragonbox#22
By the way, I'm a bit confused of what you wrote in the doc/comments about the rounding mode. Could you elaborate on, for example, this?
// According to the main Dragonbox algorithm, this is the left-closed
// directed rounding case, however, that is the case of I = [w,w+), which
// does not round-down. Rather, we want the right-closed directed rounding
// case, or I = (w−,w]. Unfortunately, this doesn't work, since it truncates
// **too** much, so we just round-nearest, tie-even, then direct the rounding.
from rust-lexical.
Here are some comments in the code I find misleading to the readers.
1. https://github.com/Alexhuszagh/rust-lexical/blob/8cd8e86ac031369cc2181d595e4958ad7ab5cd8a/lexical-write-float/src/algorithm.rs#L826 I'm not sure what you mean here. The ones provided by dragonbox (if you mean the reference implementation) eventually boil down to code mostly identical to what you wrote. There are small differences on the details but they should not really make any differences on the performance.
Ok this is actually misleading on my part. I ported everything manually by hand, and thought I went back and re-documented everything properly and checked to ensure my comments during initial porting were correct. This is clearly misleading after checking the resulting assembly, however.
2. https://github.com/Alexhuszagh/rust-lexical/blob/8cd8e86ac031369cc2181d595e4958ad7ab5cd8a/lexical-write-float/src/algorithm.rs#L861 This one is not correct. What's being computed is `floor(x * log10(2) - log10(4/3))`; please refer to the [paper](https://github.com/jk-jeon/dragonbox/blob/master/other_files/Dragonbox.pdf) (Section 5.4).
Ah, I ported this by hand (with a bit of guidance from the Rust auto-generated port) and may have been confused. Thanks for the correction.
3. https://github.com/Alexhuszagh/rust-lexical/blob/8cd8e86ac031369cc2181d595e4958ad7ab5cd8a/lexical-write-float/src/algorithm.rs#L355 and similar lines. There should be no `floor` on the LHS's (the RHS's are not integers). This was my mistake and I corrected them in my repo recently.
Thank you.
from rust-lexical.
I'm afraid if my explanation on the "decimal-to-binary rounding policy" confused you. Let me add some more explanation to help your understanding, if that's the case. Note that what the decimal-to-binary rounding mode policy determines is the assumption on what a correct parser will do when it has to round, not what Dragonbox will do when it has to round. In other words, it dictates the search interval where Dragonbox finds the shortest decimal number from. How Dragonbox rounds its own results is determined by the "binary-to-decimal rounding policy", on the other hand. And that policy has no effect on compute_left_closed_directed
and compute_right_closed_directed
, because there can be no rounding tie for those cases.
If I'm correct, the Rust compiler will parse what's written in the source code according to the round-to-nearest, tie-to-even rounding mode, so seemingly contradictory results are actually due to incompatible assumptions on the rounding mode. Let's look at an example.
compute_left_closed_directed
1.23456 => (12345601, -7)
for binary32
Relevant exact values of IEEE-754 binary32 format are:
1.23455989360809326171875
1.2345600128173828125
1.23456013202667236328125
Hence, the distance from the first one to 1.23456
is 0.00000010639190673828125
while that from the second one is 0.0000000128173828125
, so the closest one to 1.23456
is the second one. Since the Rust compiler uses the round-to-nearest, tie-to-even rounding mode, the second one is the one 1.23456
ends up being parsed into. Now, if you invoke compute_left_closed_directed
, it assumes that a parser will round-down; that is, a parser will return w
for every number inside the interval [w,w+)
. So compute_left_closed_directed
will try to find the one inside [w,w+)
that has the smallest number of digits. In the above case, w=1.2345600128173828125
and w+=1.23456013202667236328125
, so 1.23456
is actually not in the search interval. Instead, the one with the smallest number of digits is 1.2345601
, which compute_left_closed_directed
correctly found out.
On the other hand, compute_right_closed_directed
will try to find the one inside (w-,w]
, so in this case 1.23456
is in the search interval, which is why 1.23456
is the output in this case.
If you indeed misunderstood things, then I hope this clarifies.
If you actually knew this already, then I'm sorry for misunderstanding you, but I'm still not so sure what you were trying to do, so could you elaborate more? For example, why this is the case?
// Neither of these is an expected result when truncating digits (we expect
// `13.999999999999998` when truncating for the first case, and `1.23456` for
// the second case).
from rust-lexical.
Ok I believe I've implemented the final corrections. Thanks once again for all your feedback:
rust-lexical/lexical-write-float/src/algorithm.rs
Lines 373 to 400 in d70229d
from rust-lexical.
This line is not correct:
Please see: jk-jeon/dragonbox#22
Thank you.
By the way, I'm a bit confused of what you wrote in the doc/comments about the rounding mode. Could you elaborate on, for example, this?
// According to the main Dragonbox algorithm, this is the left-closed // directed rounding case, however, that is the case of I = [w,w+), which // does not round-down. Rather, we want the right-closed directed rounding // case, or I = (w−,w]. Unfortunately, this doesn't work, since it truncates // **too** much, so we just round-nearest, tie-even, then direct the rounding.
Let me re-enable this code path, run my test suite and get back to you, and the comment is slightly misleading so I'll have to update that. What I was getting is where the interval would be rounded below the desired rounding point, eg., 14.0 going to 139999999 (this is not a real example, but since I don't have one on the top of my head, this is just merely for illustration). When truncating the digits later, I would then get 13.999, which wasn't expected in the context. I'll get a working example (if it still exists, it may have been a bug during implementation) and get back to you.
The other approach, the left-closed directed rounding case, would sometimes produce significant digits above my the desired result, while the right-closed directed rounding case would produce significant digits below the desired result, which when truncated could seem decently misleading.
This might be an implementation bug on my end or fixed in the bug fixes above you provided, in which it would not reproduce.
from rust-lexical.
This is going to take some time: I'm first going to pore over the changes in master to ensure that all patches to the dragonbox algorithm are fixed in lexical, and then test to ensure 3). still reproduces, and if so, then provide meaningful examples of what I mean. In the meantime, I've correct the comments above.
from rust-lexical.
Ok for the first case (prior to any bug fixes), for my implementation, I was getting the following results for the various cases. The last example in each case is "{:.25f}".format(13.999999999999998)
.
compute_nearest_normal
1.23456 => (123456, -5)
for binary32.1.23456 => (123456, -5)
for binary64.13.9999999999999982236431606 => (13999999999999998, -15)
for binary64.
compute_left_closed_directed
1.23456 => (12345601, -7)
for binary32.1.23456 => (12345600000000002, -16)
for binary64.13.9999999999999982236431606 => (13999999999999999, -15)
for binary64.
compute_right_closed_directed
1.23456 => (123456, -5)
for binary32.1.23456 => (123456, -5)
for binary64.13.9999999999999982236431606 => (13999999999999982, -15)
for binary64.
That is, each interval does what they advertise (left-closed does in fact round-up in the interval, or compute [w,w+)
, and right-closed does generally compute the shorter interval but can sometimes round-down, or compute (w−,w]
, but this can lead to some paradoxical results. This is likely not an issue with the algorithm, but for my specific use-case, this isn't desirable. I can update the comment to make this clearer.
from rust-lexical.
Ok I've applied the bug fix and added the additional comments documenting the algorithm choices internally:
rust-lexical/lexical-write-float/src/algorithm.rs
Lines 373 to 412 in cc3ba27
The new comments better describe the desired behavior, and why although the algorithms work as described, for this library's specific purposes, the compute_nearest
interval has the desired properties. The truncation might produce less significant digits, but a significant digit being different from the compute_nearest
case isn't exactly expected behavior within this specific context.
The bug fix is here:
rust-lexical/lexical-write-float/src/algorithm.rs
Lines 709 to 718 in cc3ba27
I believe this addresses all the above concerns. If there's still any remaining issues or misleading comments, I'd be happy to fix them.
from rust-lexical.
what a correct parser will do when it has to round, not what Dragonbox will do when it has to round
Ah.... this makes a lot more sense. This was the part I was confused about. The annotations were mostly for me to understand to ensure why I wasn't using that algorithm, because re-reading the paper I had at times wondered while reading the paper a few times that a different algorithm would be more suited to my specific application. I'm originally a biology major, not a CS or math major, so sometimes it can take me a bit more time to fully comprehend things. I highlighted both cases just because they're examples of why they don't work for my specific use-case, but I was kind of confused on the exact details of why. This makes a lot more sense.
I can update my comments now that I better understand. These were mostly internally for me from a maintenance perspective for why I chose to use them. I had assumed that round_nearest
was, in-fact, the correct algorithm, but was somewhat confused on the exact details of why the others didn't work (hence the notes for a maintenance perspective), and attempted a few and realized they definitely did not work. What I probably wanted in this case was nearest_toward_zero
, which makes intuitive sense.
from rust-lexical.
Also, this wasn't really a case of an unclear explanation, but I am extremely grateful for all the feedback and bug fixes you've provided. Rather, it was me reading the paper, then looking for the source code, checking to see if it might be applicable, trying it and realizing it might not work here (without fully comprehending it) and documented why it didn't do as expected for my particular application. But your explanation puts things into much greater clarity: thank you.
from rust-lexical.
I'm truly glad it helped!
And again thanks a lot for turning my work into something really useful😊
from rust-lexical.
Related Issues (20)
- [OTHER] Improve internal safety comments and architecture
- [BUG] Unsoundness: `try_parse_{4,8}digits` appear to advance iterators out of bounds
- [BUG] Unsoundness in `Bytes::read()` HOT 1
- [BUG] BytesIter should be an `unsafe trait` or private
- [BUG] Bounds for integer parsing are not correctly checked HOT 1
- [FEATURE] Update Dragonbox Algorithm HOT 2
- [BUG] `umul192_lower128` can error on due to overflow on addition
- [FEATURE] Add thousands digit separator for ```WriteIntegerOptions``` to use with ```to_string_with_options```
- parse_partial ignoring minus sign in presence of trailing characters HOT 3
- [BUG] Assertion failure on parsing hexadecimal floats HOT 1
- [BUG] Tests fail to compile because quickcheck is missing from Cargo.toml in the crates.io tarball. HOT 4
- [QUESTION] Use of min_significant_digits with max_significant digits
- [BUG] Integer overflow checking logic is incorrect
- [FEATURE] `write` APIs should take `buffer: &mut [MaybeUninit<u8>]`
- [QUESTION] support custom types (i256)
- [BUG] -0.0 printed as "0.0" HOT 1
- [BUG] Safety comments for MaybeUninit::assume_init calls are wrong, calls are UB
- [BUG] consecutive_digit_separator has no effect
- [BUG] Inconsistent error between int and float parsing
- [BUG] `no_integer_leading_zeros(true)` incorrectly parses input
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-lexical.