I had created KV array of tiledb, and stored around 10000 key-value pairs (tiled

Sorry, was not able to respond soon. tiledb version is 0.1.1</

<a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://github

Read failed; Trying to read beyond buffer size about tiledb-py HOT 6 CLOSED

parasAtmedia commented on May 27, 2024

Read failed; Trying to read beyond buffer size

from tiledb-py.

Comments (6)

stavrospapadopoulos commented on May 27, 2024

Hi there,

We will certainly look into it. A couple of questions:

What version of libtiledb does the Python wrapper use?
Can you please share some code on how you create and populate the kv store? On the C side there is a function tiledb_kv_set_max_buffered_items that essentially controls the number of fragments to be created. If you set it >10,000 in you example, you will end up with a single fragment, thus no consolidation will be needed (if you need a quick solution until we see what the problem with consolidation is). @jakebolewski is that exposed in the Python API?

from tiledb-py.

jakebolewski commented on May 27, 2024

Unfortunately no, the KV integration is not very optimized / inefficient currently. We can push an update that fixes these issues.

from tiledb-py.

parasAtmedia commented on May 27, 2024

Sorry, was not able to respond soon.

tiledb version is 0.1.1
Sample Code:

self.tile_db_context = tiledb.Ctx()
self.a1 = tiledb.Attr(self.tile_db_context, "attribute_value", compressor=("gzip", -1), dtype=bytes)
self.key_value_instance = tiledb.KV(self.tile_db_context, "array_name", attrs=(self.a1,))
self.key_value_instance.dump()

index = 0
data_dic = {}
# self.data_set is a list of dictionaries {"key": ..., "value":...}
for v in self.data_set:
    data_dic[v["key"]] = v["value"]
    index += 1
    if index % 100 == 0:
        self.key_value_instance.update(data_dic)
        data_dic = {}

self.key_value_instance.consolidate()

I am performing kv update after every 100 instances. Update + Consolidation after each key-value pair was slowing it down.

from tiledb-py.

jakebolewski commented on May 27, 2024

d363292 should help with the write performance and the need to consolidate, unfortunately read performance will still be an issue until the next version of TileDB, the sparse read algorithm is being overhauled.

This will appear in an upcoming patch release of TileDB-Py

from tiledb-py.

parasAtmedia commented on May 27, 2024

Thanks for responding. Looking forward to 0.1.2!

from tiledb-py.

jakebolewski commented on May 27, 2024

All these fixes will be integrated into TileDB 1.3 / TileDB-Py 0.2, which will be released next week.

from tiledb-py.

Recommend Projects

Read failed; Trying to read beyond buffer size about tiledb-py HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent