Comments (6)
Hi there,
We will certainly look into it. A couple of questions:
- What version of
libtiledb
does the Python wrapper use? - Can you please share some code on how you create and populate the kv store? On the C side there is a function
tiledb_kv_set_max_buffered_items
that essentially controls the number of fragments to be created. If you set it >10,000 in you example, you will end up with a single fragment, thus no consolidation will be needed (if you need a quick solution until we see what the problem with consolidation is). @jakebolewski is that exposed in the Python API?
from tiledb-py.
Unfortunately no, the KV integration is not very optimized / inefficient currently. We can push an update that fixes these issues.
from tiledb-py.
Sorry, was not able to respond soon.
- tiledb version is 0.1.1
- Sample Code:
self.tile_db_context = tiledb.Ctx()
self.a1 = tiledb.Attr(self.tile_db_context, "attribute_value", compressor=("gzip", -1), dtype=bytes)
self.key_value_instance = tiledb.KV(self.tile_db_context, "array_name", attrs=(self.a1,))
self.key_value_instance.dump()
index = 0
data_dic = {}
# self.data_set is a list of dictionaries {"key": ..., "value":...}
for v in self.data_set:
data_dic[v["key"]] = v["value"]
index += 1
if index % 100 == 0:
self.key_value_instance.update(data_dic)
data_dic = {}
self.key_value_instance.consolidate()
I am performing kv update after every 100 instances. Update + Consolidation after each key-value pair was slowing it down.
from tiledb-py.
d363292 should help with the write performance and the need to consolidate, unfortunately read performance will still be an issue until the next version of TileDB, the sparse read algorithm is being overhauled.
This will appear in an upcoming patch release of TileDB-Py
from tiledb-py.
Thanks for responding. Looking forward to 0.1.2!
from tiledb-py.
All these fixes will be integrated into TileDB 1.3 / TileDB-Py 0.2, which will be released next week.
from tiledb-py.
Related Issues (20)
- Installation fails on python 3.12.0 HOT 3
- DataFrame `from_csv` doesn't support alternative s3-compatible backend
- [bug] Enumeration attribute incorrectly writes Pandas categorical column HOT 5
- [feature request] Query conditions have unexpected behavior with enum attributes
- Pip failed building wheel (MacOS 13, M1 silicon chip) HOT 10
- consolidating a group with mode=`group_meta` fails via tiledb.consolidate high-level function HOT 1
- consolidation followed by vacuum can corrupt array HOT 4
- Segfault on array_fragments call HOT 4
- pandas 2.2.0 breaks tests
- The nightly build job failed on Tuesday (2024-02-13) HOT 32
- The nightly build with earilest supported numpy job failed on Sunday (2024-03-03) HOT 24
- Reading and writing multiple cells with one function call to avoid loops in Python HOT 1
- Writing sparse arrays with no duplicates coordinates allowed HOT 4
- Reading sparse tiledb array abruptly exits HOT 17
- Unable to consolidate via fragment_uris HOT 2
- add option `order=False`
- Can't read attributes of type DataType.GEOM_WKB / GEOM_WKT with pytiledb 0.28 HOT 1
- ast.Bytes Deprecation warning HOT 3
- The nightly build with earliest supported numpy job failed on Sunday (2024-05-19) HOT 1
- The nightly build with earliest supported numpy job failed on Tuesday (2024-05-21) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tiledb-py.