Ok, I want this to be more of a conversational piece that I'm looking for input but we're seeing a design decision in bufio cause a lot of errors crop up in our code base.
Let's examine the following code sample pretty much taken from the documentation
var bio = require("bufio")
const bw = bio.write();
bw.writeVarString('💪💎🚀', 'binary')
const bytes = bw.render()
const br = bio.read(bytes);
console.log(br.readVarString('binary'))
> Output: "=ª=�=�"
The output is a string that doesn't represent the original input. The reasoning is obvious. Binary encoding is not binary encoding, it's actually an alias for latin-1
which is affectively ASCII
.
'binary': Alias for 'latin1'. See binary strings for more background on this topic. The name of this encoding can be very misleading, as all of the encodings listed here convert between strings and binary data. For converting between strings and Buffers, typically 'utf8' is the right choice.
'ascii': For 7-bit ASCII data only. When encoding a string into a Buffer, this is equivalent to using 'latin1'. When decoding a Buffer into a string, using this encoding will additionally unset the highest bit of each byte before decoding as 'latin1'. Generally, there should be no reason to use this encoding, as 'utf8' (or, if the data is known to always be ASCII-only, 'latin1') will be a better choice when encoding or decoding ASCII-only text. It is only provided for legacy compatibility.
See https://nodejs.org/api/buffer.html#buffers-and-character-encodings
This is problematic because bufios default encoding is binary
by default for varStrings. Most user input, and most input across the internet now is expected to be UTF8. Developers on our team who use bufio naturally don't realize they should be passing a second encoding as 'utf8' in the encoding parameter of writeVarString
. The reason is because node in general defaults to utf8 everywhere.
Is there some reason for this design decision? Particularly because ASCII is encoded as single byte characters by default in UTF8? I think changing the default encoding would be a very hairy decision that could break backwards compatibility, but I also don't think anyone using bufio should ever encode strings as ascii. 99% of the time you are creating a bug in your platform.
We may change our typescript type definitions to make encoding not optional, which would force developers to pass it in at the cost of diverging from bufios actual code interface.
https://github.com/iron-fish/ironfish/blob/master/ironfish/src/typedefs/bufio.d.ts#L50
I'll close this afterwards since it's an open ended conversation.