Code Monkey home page Code Monkey logo

Comments (8)

ztellman avatar ztellman commented on June 14, 2024

Well, they use different code paths (the String constructor for byte arrays, and CharsetDecoder for streams), but obviously I'd expect them to be equivalent. Can you provide a failing test case?

from byte-streams.

joelittlejohn avatar joelittlejohn commented on June 14, 2024

I'm afraid I'm struggling to create a minimal example here. I have 4k of text that demonstrates the problem that I can't share. I have so far failed to minimize this further or find a good random string that demonstrates the same problem.

Could you list the conversion steps from ByteArrayInputStream to String? I'm having a hard time understanding the high-level conversions that are made using the graph by step debugging byte-streams. Maybe I can take my private example through these manually to see where the errors arise.

from byte-streams.

ztellman avatar ztellman commented on June 14, 2024

The InputStream is turned into a ByteSource [1], then the ByteSource is turned into a CharSequence [2]. Hope that helps, let me know if you have any other questions.

[1] https://github.com/ztellman/byte-streams/blob/master/src/byte_streams.clj#L526
[2] https://github.com/ztellman/byte-streams/blob/master/src/byte_streams/char_sequence.clj#L81

from byte-streams.

joelittlejohn avatar joelittlejohn commented on June 14, 2024

My hunch is that this is an issue of single characters spanning a chunk-size boundary in byte-streams.char-sequence/lazy-char-buffer-sequence causing a unicode replacement character to be used by the decoder.

from byte-streams.

ztellman avatar ztellman commented on June 14, 2024

My understanding is that the CharsetDecoder should handle that properly, but if so then we'd expect the malformed characters to show up at the 4096th byte, since that's the default chunk size. Maybe to make a more minimal test case you can try specifying {:chunk-size 16} or something in the convert call?

from byte-streams.

joelittlejohn avatar joelittlejohn commented on June 14, 2024

I need to sleep now, but yes, I can easily create a minimal test case now as I'm pretty certain that the problem is as described above. I confirmed this using the method you described. If I simply create a string of 3-byte chars I can see the error on the 4096th byte boundary. If I reduce the chunk size I see a lot more errors.

from byte-streams.

ztellman avatar ztellman commented on June 14, 2024

Okay, I'll see if I can track down what's happening.

from byte-streams.

joelittlejohn avatar joelittlejohn commented on June 14, 2024

Fixed by 29f50f7

from byte-streams.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.