Code Monkey home page Code Monkey logo

Comments (13)

creationix avatar creationix commented on August 17, 2024

I'm afraid this is a limitation of JavaScript itself. The number type only had 53 bits of precision total.

Try to simply type the number in a node repl or the browser console and you'll see the echoed back value is the same thing jsonparse gives you.

I would recommend storing the key as a string (maybe hex encoded to save space) if you wish to keep all the digits exactly.

from jsonparse.

mpnovikova avatar mpnovikova commented on August 17, 2024

I am aware of those limitations in JavaScript. The problem is I am reading this data from the 3-rd party API Citrix to be precise. And this is how they give it to users. Would you be able to track the scenario when a sequence on numbers is longer than JavaScript can correctly parse as a Int and parse that sequence as a string?

from jsonparse.

creationix avatar creationix commented on August 17, 2024

It is possible to check the number after parsing and if it's greater than the highest possible 53-bit integer, I could return a string or something, but that would break for people expecting a number. There are many cases where an imprecise number is better than a precise string.

Do you feel confident enough to add the check to the number parser? We could maybe add it as a flag so that you could opt-in to the new behavior? The full list of characters would need to be recorded like is done for strings and then released if the number is used, or used if the number is dropped.

from jsonparse.

creationix avatar creationix commented on August 17, 2024

Actually, it's probably a good idea to buffer the number and then use parseFloat(str). The memory overhead of the buffered number is minimal, and surely parseFloat is much faster than my manual implementation. It would shrink the code a bit too.

Once the new number parser is in, it would be trivial to store the string in case the parsed version is above the 53-bit threshold.

from jsonparse.

mpnovikova avatar mpnovikova commented on August 17, 2024

We use JSONStream library that relies on this parser. There are many cases where it just uses a number. However in this particular case these long Int IDs are critical for the business functionality. The workaround I had was to omit the JSONStream all together, collect the data in the buffer, then use JSONbig.parse() to parse long int's; but then I run into performance issues and timeouts because instead of streaming the data I am waiting for it.

To answer you question, for the purposes of our project I could fork this repo and add your workaround. I would appreciate if you could point where and how this fix should be done.

Thank you for being so response, highly appreciate it

from jsonparse.

creationix avatar creationix commented on August 17, 2024

So the basic idea is to accumulate the characters in the number as a string. You can re-use the same property string parsing uses https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L176

You don't need to worry about utf-8 decoding in the number code, all the digits are in the ascii 7-bit range.

Then at all the lines that emit NUMBER tokens (look like this.onToken(NUMBER...), you'll need to check if you should store the string or the number (probably factor this out into a common helper function).

from jsonparse.

creationix avatar creationix commented on August 17, 2024

The parseFloat idea is separate, but does depend on the string being buffered. If you add parseFloat in the emitting helper, you can remove all the lines that calculate the value piecemeal. The many number states are still required to know when the number has ended however.

from jsonparse.

mpnovikova avatar mpnovikova commented on August 17, 2024

Thank you, I will give it a try

from jsonparse.

mpnovikova avatar mpnovikova commented on August 17, 2024

This fragment https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L183-L307 can be replaced by this

 } else if (this.tState === NUMBER1 || this.tState == NUMBER2 || this.tState === NUMBER3) {
        n = buffer[i];
        switch (n) {
          case 0x30: // 0
          case 0x31: // 1
          case 0x32: // 2
          case 0x33: // 3
          case 0x34: // 4
          case 0x35: // 5
          case 0x36: // 6
          case 0x37: // 7
          case 0x38: // 8
          case 0x39: // 9
          case 0x2e: // .
          case 0x65: // e
          case 0x45: // E
          case 0x2b: // +
          case 0x2d: // -
            this.magnatude += String.fromCharCode(n);
            this.tState = NUMBER3;
            break;
          default:
            this.tState = START;
            if (this.negative) {
              this.magnatude = '-' + this.magnatude;
            }
            this.onToken(NUMBER, parseFloat(this.magnatude));
            this.magnatude = undefined;
            i--;
            break;
        }
}

without loosing any existing functionality. Is that what you had in mind when you talked about parseFloat?

from jsonparse.

creationix avatar creationix commented on August 17, 2024

This is close, great work. Just a couple suggestions:

  • By merging the 3 states into one, you will be also allowing illegal patterns like --EE31e. It would be best I think to keep the separate states so as to keep the parser a little stricter. If parseFloat weren't so liberal, we could just depend on that to catch illegal patterns, but it will happily parse strings like "123this is not a number".
  • You can store the initial - in the string instead of setting the this.negative, then you don't need to prepend a "-" at the end.
  • I was initially confused by this.magnatude being used to store the string version of the number, I'd rather re-use the this.string property that string parsing uses. The parser is never parsing strings and numbers at the same time so there is no conflict and it makes sense semantically as well since we are storing a string.

You should send a pull request with the proposed changes and my suggested fixes. That way I can discuss inline and once it's good I can merge the PR and you'll get credit for the change and be listed as a collaborator in the repo.

from jsonparse.

mpnovikova avatar mpnovikova commented on August 17, 2024

On point #1
"123this is not a number" won't go through 0-9eE+- mesh all parseFloat has to do after that is to give you NaN on cases like --EE31e; In case of 123Ee it will give you 123 both results are fine with me. Here is the branch fee free to comment inline https://github.com/mpnovikova/jsonparse/blob/bignum/jsonparse.js
Some unit tests are failing not but I am working on them

from jsonparse.

jeromew avatar jeromew commented on August 17, 2024

I understand that this issue can be closed after 044b268

from jsonparse.

mmikhalko avatar mmikhalko commented on August 17, 2024

The problem is still presenting if use a negative big int number like -9223372036854775808.
It had better to add in the RegExp '-' symbol in the line
https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L270

There is a pull request for it case #38

from jsonparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.