Comments (6)
UnQLite doesn't differentiate between bytes and utf8-encoded unicode strings. unqlite-python
exposes a Python interface over the C library, and to ensure the greatest flexibility we treat stuff as bytestrings. For example, I think one of the unqlite examples uses unqlite to store mp3 files -- if we treated stuff a unicode strings there would be no way to store binary file data.
It might make sense to provide an option, however, to decode strings... I'll think about it.
from unqlite-python.
If possible, I would like to bump this issue. In using this on a larger scale as a JSON document store, the filter queries become quite convoluted when having to convert all strings used for comparison to bytes.
Here is a trivial example. The complexity compounds with more deeply nested documents.
Python 3.8.0 (v3.8.0:fa919fdf25, Oct 14 2019, 10:23:27)
[Clang 6.0 (clang-600.0.57)] on darwin
>>> from unqlite import UnQLite
>>> db = UnQLite()
>>> users = db.collection('users')
>>> users.create()
>>> users.store({'name': 'Donald Duck'})
0
>>> users.filter(lambda u: u['name'].startswith('Donald'))
[]
>>> users.filter(lambda u: u['name'].startswith(b'Donald'))
[{'name': b'Donald Duck', '__id': 0}]
An example of the round-tripping issue this presents.
>>> doc = json.loads('{"name": "Fred Flintstone"}')
>>> users.store(doc)
1
>>> doc = users.filter(lambda u: u['name'].startswith(b'Fred'))[0]
>>> json.dumps(doc)
TypeError: Object of type bytes is not JSON serializable
Certainly not sure about the implementation implications, but the naive expectation would be that the data comes out the way it went in. Since UnQLite doesn't differentiate between bytes and utf8-encoded unicode strings, perhaps there is some way to apply the same concept during filtering, even if the consumer would have to fix the content of all the documents once retrieved.
Thank you.
from unqlite-python.
Since UnQLite doesn't differentiate between bytes and utf8-encoded unicode strings, perhaps there is some way to apply the same concept during filtering, even if the consumer would have to fix the content of all the documents once retrieved.
That's the crux of the issue. UnQLite doesn't have a text type - it's just bytes / int / double / bool / array / object / null. When the filter callback is called by unqlite, we receive an array of unqlite values which have to be converted to Python types. Note that we do convert the keys of dictionaries to unicode where possible (see the final function, unqlite_value_to_dict
), but values are left as-is. If you're dead set on this change, I'd suggest just forking and patching that function to decode your values as well.
Also I consider that changing the way this behaves at this point could break existing code.
from unqlite-python.
After reading the unqlite maintainers comments, though, I think I may actually change this... They suggest that storing binary data should be done using the regular kv interface and treat the jx9 stuff as text.
I will reopen this and make the change.
from unqlite-python.
I've changed the behavior so that all Jx9 / VM / Collection interfaces return string data as unicode (python3 str
), so it is no longer necessary to mess with encoding/decoding.
For those users who wish to store binary data in the collections, the unqlite developers recommend either:
- use the kv store apis directly
- encode it using base64
I will be tagging a new release, 0.8.0, which will also be dropping "official" support for Python 2.
from unqlite-python.
That's fantastic! Thank you for being so responsive.
from unqlite-python.
Related Issues (20)
- collection fetch, anyway it will return a value HOT 1
- Update db entry in multiple threads HOT 6
- Question about multithread HOT 2
- Use in multi-process environment? HOT 2
- Flags unavailable via Python HOT 2
- Wrong kwarg name in UnQLite() API docs.
- AsyncIO Support? HOT 5
- ValueError when inserting to nonexistent collection HOT 2
- collection.store returs int HOT 3
- empty_collection.last_record_id() is 0 HOT 1
- FR: Binary releases on PyPI HOT 3
- Collection cursor only returns first record HOT 16
- Collection reverse iterator HOT 1
- Unqlite on python 3.10 fails with unqlite.cpython-310-x86_64-linux-gnu.so: undefined symbol: _PyGen_Send HOT 8
- Readme update HOT 1
- Import error HOT 3
- Is it possible to use a samba shared database file? HOT 1
- Cannot install unqlite 0.9.4 on Windows HOT 3
- Latest pip install asks for moving away from setup.py HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unqlite-python.