Code Monkey home page Code Monkey logo

Comments (6)

coleifer avatar coleifer commented on June 1, 2024

UnQLite doesn't differentiate between bytes and utf8-encoded unicode strings. unqlite-python exposes a Python interface over the C library, and to ensure the greatest flexibility we treat stuff as bytestrings. For example, I think one of the unqlite examples uses unqlite to store mp3 files -- if we treated stuff a unicode strings there would be no way to store binary file data.

It might make sense to provide an option, however, to decode strings... I'll think about it.

from unqlite-python.

james-carpenter avatar james-carpenter commented on June 1, 2024

If possible, I would like to bump this issue. In using this on a larger scale as a JSON document store, the filter queries become quite convoluted when having to convert all strings used for comparison to bytes.

Here is a trivial example. The complexity compounds with more deeply nested documents.

Python 3.8.0 (v3.8.0:fa919fdf25, Oct 14 2019, 10:23:27) 
[Clang 6.0 (clang-600.0.57)] on darwin
>>> from unqlite import UnQLite
>>> db = UnQLite()
>>> users = db.collection('users')
>>> users.create()
>>> users.store({'name': 'Donald Duck'})
0
>>> users.filter(lambda u: u['name'].startswith('Donald'))
[]
>>> users.filter(lambda u: u['name'].startswith(b'Donald'))
[{'name': b'Donald Duck', '__id': 0}]

An example of the round-tripping issue this presents.

>>> doc = json.loads('{"name": "Fred Flintstone"}')
>>> users.store(doc)
1
>>> doc = users.filter(lambda u: u['name'].startswith(b'Fred'))[0]
>>> json.dumps(doc)
TypeError: Object of type bytes is not JSON serializable

Certainly not sure about the implementation implications, but the naive expectation would be that the data comes out the way it went in. Since UnQLite doesn't differentiate between bytes and utf8-encoded unicode strings, perhaps there is some way to apply the same concept during filtering, even if the consumer would have to fix the content of all the documents once retrieved.
Thank you.

from unqlite-python.

coleifer avatar coleifer commented on June 1, 2024

Since UnQLite doesn't differentiate between bytes and utf8-encoded unicode strings, perhaps there is some way to apply the same concept during filtering, even if the consumer would have to fix the content of all the documents once retrieved.

That's the crux of the issue. UnQLite doesn't have a text type - it's just bytes / int / double / bool / array / object / null. When the filter callback is called by unqlite, we receive an array of unqlite values which have to be converted to Python types. Note that we do convert the keys of dictionaries to unicode where possible (see the final function, unqlite_value_to_dict), but values are left as-is. If you're dead set on this change, I'd suggest just forking and patching that function to decode your values as well.

Also I consider that changing the way this behaves at this point could break existing code.

from unqlite-python.

coleifer avatar coleifer commented on June 1, 2024

After reading the unqlite maintainers comments, though, I think I may actually change this... They suggest that storing binary data should be done using the regular kv interface and treat the jx9 stuff as text.

I will reopen this and make the change.

from unqlite-python.

coleifer avatar coleifer commented on June 1, 2024

I've changed the behavior so that all Jx9 / VM / Collection interfaces return string data as unicode (python3 str), so it is no longer necessary to mess with encoding/decoding.

For those users who wish to store binary data in the collections, the unqlite developers recommend either:

  • use the kv store apis directly
  • encode it using base64

I will be tagging a new release, 0.8.0, which will also be dropping "official" support for Python 2.

from unqlite-python.

james-carpenter avatar james-carpenter commented on June 1, 2024

That's fantastic! Thank you for being so responsive.

from unqlite-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.