Comments (5)
Hello @rhs3i
The hdf5 file will always be in tmp as:
H5_STORAGE = pathlib.Path(tempfile.gettempdir()) / "tablite.hdf5"
May I ask what you're trying to achieve?
from tablite.
Certainly. I'm the author of H5s, a scanner for HDF5. The first objective is to verify the scanner renders tablite HDF5 well. HDF5 that is intended to be primarily machine-read can stress a visual model in ways perhaps not considered, but the graphical constructs should still hold up when inspecting these files. The screenshots and links to the visual vocabulary of the scanner can give some illustration.
The second objective is to get a quick bit of insight as to whether H5s can augment usage of tablite in an interesting way, but that would be a future topic.
from tablite.
Hi Robert,
As you can see from the usage of tempdir
the HDF5 files are generally used as a volatile database where data is stored in a hierarchy best described as:
- Tables have columns
- Columns have pagehandlers
- Page handlers have pages.
- Pages are of type: (a) Simple (int,float), (b) String (str, utf-8), (c) Mixed (non simple datatypes), (d) Sparse (lots of Nones)
In the tablite.hdf5
-file you will therefore find that the Pages
contain all the data, whilst the dataset (hdf groups) for Tables and Columns are empty and only have metadata in the attrs
-field.
The details are explained here in the HDF5 group webinar: https://youtu.be/OoHVIKAD854?t=1415
from tablite.
Ah. Thank you for the correction and apologies for the time-wasting. I did sit for your HDF5 webinar (thank you), but I misunderstood the design, thinking that once the HDF5 backing-store had been created and stored all the computational deltas, it would persist beyond program execution and be used by a subsequent downstream tablite processor. But your presentation was clear--the re-import/reload example you showed (39:19) was from within a single program session. Scanning a volatile HDF5 datastore might have some utility in a debugging capacity, but that's another matter and may not be very useful.
Appreciate your time in getting me straightened out on this.
from tablite.
No problem. Happy I could help.
from tablite.
Related Issues (20)
- Join (reindexing) fails when table spans multiple pages HOT 2
- Documentation is out of sync HOT 1
- Determine method to handle out-of-memory for large joins. HOT 1
- Proposed format specification HOT 1
- multi proc groupby HOT 1
- multi proc join HOT 3
- Add warning in add_rows that is the slowest method HOT 1
- Deprecating support for python 3.8 in favor of type hints throughout the code HOT 1
- Columns with empty names HOT 2
- Table.load very slow with dtype('O') HOT 5
- Bloat in H5 storage following repeated SIGKILL HOT 3
- Statistics discrepancies in median/mode HOT 1
- Do Tablite Support different datasets Concurrently ? HOT 6
- Addition of match operator HOT 5
- HDF5 file size never decreases + concurrent interpreters can overwrite each others files. HOT 14
- sorting problem with datetime dt columns HOT 1
- Inconsistent row slice HOT 3
- Slow import of files with text escape HOT 16
- statistics() fails on time column HOT 2
- my first issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tablite.