Comments (2)
A solution may be to cache the partially decompressed cluster together with the current state of the decompression. Then, if/when a blob from the not-yet-decompressed portion of the cluster is later needed, decompression can be performed incrementally from the point at which it was previously suspended.
It is a tricky thing to do (while possible) :
- We don't want to invalidate the cache. So we must not reallocate the already decompressed data. It means that we must do a partial decompression to read the offsets and be able to get the total size, allocate a buffer for this size and then do another partial decompression (until the article to read) in this buffer. It would totally interleave the functionality of decompressing the content and reading it (where it is two different paths now)
- We have to check every time we want to access a blob in the cluster that the data has been actually uncompressed. Every access to the data must lead to a decompression and a modification of the buffer. It means that this should be thread safe as several threads may access the cluster in the same time.
- Storing the decompression state means storing the decompression stream (lzma_stream, z_stream or ZSTD_[CD]Stream) and maybe the compressed data (or we may have to do some complex synchronization to load the data later (to investigate)).
This is not a easy task. And I'm not sure we need it. At least we should do some measurement to be sure it worth it (especially with the new zstd algorithm that should speedup the decompression).
from libzim.
How is this going to play with caching of (decompressed) clusters? If the chances are high that other blobs will be used from the same cluster while it is in the cache, decompressing the entire cluster once is better than partially decompressing it multiple times. A solution may be to cache the partially decompressed cluster together with the current state of the decompression. Then, if/when a blob from the not-yet-decompressed portion of the cluster is later needed, decompression can be performed incrementally from the point at which it was previously suspended.
from libzim.
Related Issues (20)
- Impossible to drop entries from the kiwix::Downloader's cache HOT 1
- Last version of libzim seems to no be present in debian repo HOT 1
- Fix ambiguities around the usage of URLs in ZIM file format HOT 32
- LibZim 9.1 under macOS throws exception that cannot be caught HOT 11
- Rename UrlPtrList to PathPtrList
- (Xapian) Exception mgmt + documentation HOT 2
- Corrupted ZIM leads to crash on macOS HOT 1
- Release 9.2.1
- Better opening of split zim archive. HOT 2
- Error message is slightly misleading if ZIM file can not be open
- We need to know if an article is front or not HOT 1
- Content offset is ignored on macOS using LibZim 9.2.0 HOT 7
- Handling of `target="_blank"` links in kiwix-serve is not working properly inside the viewer HOT 3
- Document in spec ability to split ZIM file HOT 2
- Purpose and constraints of LongDescription should be clarified HOT 3
- Windows CI using Github actions
- New crash scenario of the libzim with corrupted ZIM file HOT 2
- Release 9.2.2
- iOS: Unable to open file (which works in Kiwix elsewhere) HOT 14
- Weird suggestion behavior when indexdata is passed but item is not marked as "FRONT_ARTICLE" HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libzim.