Code Monkey home page Code Monkey logo

Comments (4)

ghing avatar ghing commented on July 17, 2024

After looking through the jsonparse code and reading up on utf8 a little bit more, I think I mischaracterized the issue. n = 174 seems to be the second byte, the continuation character, of the 2-byte "registered trademark" character. For some reason, the value of i isn't getting properly incremented when the first byte (0xc2) is read. This should happen at https://github.com/creationix/jsonparse/blob/master/jsonparse.js#L142

I'm working on creating a test that replicates this issue outside of my use case.

from jsonparse.

creationix avatar creationix commented on July 17, 2024

Thanks for looking into this. I'm currently super busy with other projects, but I'll happily review a pull request when you get one.

from jsonparse.

ghing avatar ghing commented on July 17, 2024

The string that's causing the problem in my JSON stream is "At The Learning Experience® (TLE®) ...".

I fired up a debugger and set a breakpoint inside the code block that handles the STRING1 state. and this is what I saw:

When it hits the '®' character, n = buffer[i] = 174, buffer[i-1] = 101 and buffer[i-2] = 99.

So, buffer[i-2] is 99, the 'c' character, buffer[i-1] is 101, the 'e' character, and buffer[i] is 174, the second byte of the two byte utf8 '®' character. It seems like n = buffer[i] should be the first byte of a utf8 character, 194 (0xc2). It seems like this byte is skipped entirely in the buffer.

I wasn't really sure how the leading UTF8 character got dropped in the buffer until I realized that the JSON is encoded as ISO-8559-1, not UTF-8.

from jsonparse.

punmechanic avatar punmechanic commented on July 17, 2024

Voting to reopen due to this SO question: the issue is still prevalent and according to OP it occurs in later versions as well - I'm not sure how much credence to give that, but given that there was no PR and this issue was closed by the issue owner and there are no referenced commits its possible the bug is still present.

A quick fix might be to ensure that no Unicode characters are passed to the stream, but something more permanent would be nice.

I'm not sure if you've fixed the issue in later releases, @creationix, but OP is using 0.0.5 which predates this issue.

from jsonparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.