Comments (6)
Index files are append-only in order to preserve the high-parallelism invariant.
In the JS version of cacache, I wrote a "garbage collector" that could be run "offline" (aka, when you can reasonably guarantee single-process, single-thread access to the cache), and it would iterate over all entries and reduce them to their latest entry value.
You can pretty trivially write this yourself by using the functions in the index
module, e.g. use cacache::index::ls
to iterate over all the entries, cacache::index::delete
, then use cacache::index::insert
with a constructed WriteOpts
(just for the options, you don't need to open a content file). It shouldn't be more than a few lines of code, in the end. LMK if you run into any trouble doing this!
from cacache-rs.
that could be run "offline"
That's a massive caveat for our use case.
You can pretty trivially write this yourself by using the functions in the index module, e.g. use cacache::index::ls to iterate over all the entries, cacache::index::delete, then use cacache::index::insert with a constructed WriteOpts (just for the options, you don't need to open a content file). It shouldn't be more than a few lines of code, in the end. LMK if you run into any trouble doing this!
And there's a window between the delete and the insert where the index is not present. We were using cacache to protect us from power failures and similar failures.
from cacache-rs.
This doesn't seem to work:
if let Some(index) = cacache::index::find_async(state_path.as_std_path(), key).await? {
debug!(?index);
if let Err(err) = cacache::index::delete_async(state_path.as_std_path(), key).await {
error!(key, ?err, "could not vacuum");
}
let hash = index.integrity.clone();
let mut write_opts = cacache::WriteOpts::new()
.integrity(index.integrity)
.time(index.time)
.size(index.size)
.metadata(index.metadata);
if let Some(raw_metadata) = index.raw_metadata {
write_opts = write_opts.raw_metadata(raw_metadata);
}
cacache::index::insert_async(state_path.as_std_path(), key, write_opts).await?;
match cacache::read_hash(state_path, &hash).await {
Ok(r) => Ok(Some(r)),
Err(cacache::Error::EntryNotFound(..)) => Ok(None),
Err(err) => Err(err).context("cacache hash not found"),
}
} else {
Ok(None)
}
It works as a cacache::read, but it doesn't seem to reduce index size. Any hints?
Thanks.
from cacache-rs.
At first glance, index::delete is an insert(null) which is a no-op?
from cacache-rs.
This works, but is neither pretty nor robust:
for filename in $(find foo/v0/state.cacache/index-v5 -type f ! -name "*.bak") ; do
tail -2 ${filename} > ${filename}.trimmed
cp ${filename} ${filename}.bak
mv ${filename}.trimmed ${filename}
done
from cacache-rs.
oh duh. I forgot that delete just inserts a null.
Yeah, I think it would be nice to have built-in "vacuum"/GC support. I just haven't gotten around to it.
from cacache-rs.
Related Issues (20)
- feature request: tokio::io::AsyncSeek support for Reader HOT 5
- about High frequency writing and deletion HOT 1
- Validation of documented system properties
- Fully deletion does not delete the value HOT 2
- Compilation error HOT 2
- feature request: cacache::Writer without a key HOT 2
- Multiple file hashes HOT 1
- Feature "tokio" does not compile with rustc 1.77.1 HOT 3
- Remove Key and clean the value
- "raw_metadata" is not raw after serialization HOT 3
- Deadlock when accessing across two threads. HOT 2
- Is cacache checking for hash collisions? HOT 1
- reading from a cacache::WriteOpts Writer, before its completed HOT 3
- Cannot delete fully index entry if content file doesn't exist.
- Some variants for copy/reflink/hard-link are missed
- [Question] Why the size of buf is not consistent?
- file corruption on Windows HOT 7
- Unwrap on Error for Tokio Join Handle
- Lockless? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cacache-rs.