Code Monkey home page Code Monkey logo

cacache-rs's Introduction

cacache CI crates.io

A high-performance, concurrent, content-addressable disk cache, optimized for async APIs.

Example

use cacache;
use async_attributes;

#[async_attributes::main]
async fn main() -> Result<(), cacache::Error> {
    let dir = String::from("./my-cache");

    // Write some data!
    cacache::write(&dir, "key", b"my-async-data").await?;

    // Get the data back!
    let data = cacache::read(&dir, "key").await?;
    assert_eq!(data, b"my-async-data");

    // Clean up the data!
    cacache::rm::all(&dir).await?;
}

Install

Using cargo-edit

$ cargo add cacache

Minimum supported Rust version is 1.43.0.

Documentation

Features

  • First-class async support, using either async-std or tokio as its runtime. Sync APIs are available but secondary. You can also use sync APIs only and remove the async runtime dependency.
  • std::fs-style API
  • Extraction by key or by content address (shasum, etc)
  • Subresource Integrity web standard support
  • Multi-hash support - safely host sha1, sha512, etc, in a single cache
  • Automatic content deduplication
  • Atomic content writes even for large data
  • Fault tolerance (immune to corruption, partial writes, process races, etc)
  • Consistency guarantees on read and write (full data verification)
  • Lockless, high-concurrency cache access
  • Really helpful, contextual error messages
  • Large file support
  • Pretty darn fast
  • Arbitrary metadata storage
  • Cross-platform: Windows and case-(in)sensitive filesystem support
  • miette integration for detailed, helpful error reporting.
  • Punches nazis

async-std is the default async runtime. To use tokio instead, turn off default features and enable the tokio-runtime feature, like this:

[dependencies]
cacache = { version = "*", default-features = false, features = ["tokio-runtime", "mmap"] }

You can also remove async APIs altogether, including removing async runtime dependency:

[dependencies]
cacache = { version = "*", default-features = false, features = ["mmap"] }

Experimental support for symlinking to existing files is provided via the "link_to" feature.

Contributing

The cacache team enthusiastically welcomes contributions and project participation! There's a bunch of things you can do if you want to contribute! The Contributor Guide has all the information you need for everything from reporting bugs to contributing entire new features. Please don't hesitate to jump in if you'd like to, or even ask us questions if something isn't clear.

All participants and maintainers in this project are expected to follow Code of Conduct, and just generally be excellent to each other.

Happy hacking!

MSRV

The Minimum Supported Rust Version for cacache is 1.67.0. Any changes to the MSRV will be considered breaking changes.

License

This project is licensed under the Apache-2.0 License.

cacache-rs's People

Contributors

06chaynes avatar andir avatar ceejbot avatar fiag avatar jkbecker avatar kinire98 avatar komar007 avatar passcod avatar pawurb avatar redmar avatar rickycodes avatar rustynova016 avatar shaug avatar theawiteb avatar uhhhwaitwhat avatar zkat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cacache-rs's Issues

memmap2 write_all call panics on cacache 10

This appears to happen with any vec <= the memmap size.

thread 'main' panicked at 'source slice length (705) does not match destination slice length (0)', /Users/chris/.cargo/registry/src/github.com-1ecc6299db9ec823/cacache-10.0.0/src/content/write.rs:78:18

That backtrace points at:

impl Write for Writer {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        self.builder.input(buf);
        if let Some(mmap) = &mut self.mmap {
            mmap.copy_from_slice(buf); // <-------------------------- this line
            Ok(buf.len())
        } else {
            self.tmpfile.write(buf)
        }
    }

    fn flush(&mut self) -> std::io::Result<()> {
        self.tmpfile.flush()
    }
}

I tested manually lowering the max memmap limit to 0 and it started working again; also pinning cacache at v9 seems to pass as well. I noticed this with both sync and async calls of cacache::write_hash.

My machine is:

  • OS: macOS 12.2.1
  • CPU: (10) arm64 Apple M1 Max
  • Memory: 25.82 GB / 64.00 GB
  • Shell: 3.2.57 - /bin/bash

If there's any other info I can post here to help please let me know, & thanks for cacache!

non-string examples

I'm looking at using cacache to store rust structs. But cache uses AsRef<[u8]> for data and AsRef for strings.

There's a lot of ways to turn a struct into a [u8] and back, and a lot of ways of turning a struct key into a str. It's easy to spend a lot of time evaluating the different ways of doing so when in many cases it'd probably be best just to choose one and move on.

And example in your docs might do wonders here. Presumably you have more insight into the better ways of doing this, so the mechanism used in the example could presumed as a "good" way, even if it isn't the best for every situation.

For example, it seems tempting to use the rust hash mechanism for the key, but that's double hashing and could cause collisions so I imagine that's not recommended.

It's also tempting to use Debug or Display formatting for the key since most structs already have it and it'd probably work well in some situations. Probably not something you should use in an example though because those are sometimes lossy.

Which means likely a serde format for both key & data. But which one, there are so many...

I found this overview. 2 years old so things may have changed, but likely still mostly correct: https://blog.logrocket.com/rust-serialization-whats-ready-for-production-today/

conclusion: json for key and bincode for data?

This was less of an issue report and more of a "thinking aloud" situation. But it may be helpful to others in the same situation, so I'm going to post it anyways. Feel free to close.

Remove `nix` dependency

nix is apparently pretty heavy, and it's currently used only for chownr. Update chownr, and then this crate, to use libc directly, instead.

Performance expectations

cacache version: v11.3.0
rustc: 1.69.0

I'd like to use cacache in a performance sensitive area of my application, but I have found there to be more overhead than hoped. For instance on my M2 MacBook Air(2,800 MBps read SSD) it takes ~90ms to read a 30MB file from cache(sync or async), while reading the file directly from disk takes ~15ms. I see similar performance within my application as well as within microbenchmarks, so I don't believe this is simply bad benchmarking.

I'm not familiar with the caching strategy, so maybe this is expected? Possibly a system(MacOS) specific issue? Bad benchmarking?

Replace `failure` with `anyhow`

The failure crate is pretty heavy and requires a bit more manual stuff than desired. There's a newer crate, anyhow, that seems like a very nice alternative and builds on std::error::Error, so it might improve compile times if used.

Non-exported error type

Hi Kat,

I am currently trying to use your library in a project of mine and kinda tripped over error-handling in the crates documentation. Because the Error struct exposed (and documented) by the crate is not actually used in any of the crates return types, it became somewhat unclear to me, what the preferred approach to error handling with your crate is.
Also, the anyhow error type is not exposed by your crate directly, so if someone wants to wrap the error themselves they would have to add an additional dependency to anyhow.

I would love to send a pull request to fix this issue, though I am not sure what the right approach would be. If you consider this purely a documentation issue I will gladly change the ReadMe and add a paragraph to the error enums documentation.

Oh and thank you for your work, on this library and in general.
Greetings,

Florian

Multiple file hashes

I was wondering whether it would be possible to store a single entry under multiple hashes by computing multiple hashes at the same time and hard linking the content in the cache.

In some usecases it makes sense to use one hash over another, because you might want to use the hash outside of the cache, but as I understand it correctly content would be duplicated when using two hashes for the same content.

An integrity can already contain multiple hashes but I think the API doesnt offer support to store/calculate multiple hashes.

The way I'm using cacache is very slow

A cache read by key now takes about ~30 seconds for my application.

A clue:

โฏ sudo du -sh *
[sudo] password for blarsen: 
15M     content-v2
2.4G    index-v5
0       tmp

Usage pattern: write to a small number of keys (<10) every few seconds. On program start, read those keys.

The cache is used to dump state to disk so that it can be read on program start after unclean exit.

The index file for each key is about 280M, over 1M entries.

It appears that you're keeping the entire history? Is this just for reliability reasons, because there doesn't appear to be an API to read older versions of a key. Is there a way to reliably trim history to get my speed back?

reading from a cacache::WriteOpts Writer, before its completed

I'm investigating allowing content to be streamed out of a CAS store, while it's still being ingested.

The rough idea I have is to be able to get a Reader for an open Writer, which reads from the tmpfile, that then somehow switches to the canonical version of the content, once the Writer is closed, and the content hash is known.

This may be a terrible idea ๐Ÿ˜…

If it did sound reasonable though, I'd be interested in helping to contribute the feature.

tests: more coverage

There's a terrifying lack of test coverage on cacache right now. There should be unit tests for at least all the API functions, and preferably also for the internal APIs.

pls add 'remove fully' option to remove() function

According the npm/cacache rm.entry API

By default, this appends a new entry to the index with an integrity of null. If opts.removeFully is set to true then the index file itself will be physically deleted rather than appending a null

thanks a lot!

[Question] Why the size of buf is not consistent?

In src/content/read.rs, there are two different size of buf, 1024 or 1024*8.

And it looks kind of random to pick the value, neither by async/sync or hard-link/reflink/copy.

So, maybe some value is just not updated?

And it would be great to explain why it's this number.

Panic when writing 1MiB or less with `write_hash`

Panics occurs when writing 1MiB or less with write_hash. That is, this works:

let _ = cacache::write_hash("./cache", &[b'a'; 1024 * 1024 + 1]).await;

But this results in a panic:

let _ = cacache::write_hash("./cache", &[b'a'; 1024 * 1024]).await;
thread 'blocking-1' panicked at 'source slice length (1048576) does not match destination slice length (0)', /home/tgnottingham/.cargo/registry/src/github.com-1ecc6299db9ec823/cacache-10.0.1/src/content/write.rs:260:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'async-std/runtime' panicked at 'task has failed', /home/tgnottingham/.cargo/registry/src/github.com-1ecc6299db9ec823/async-task-4.3.0/src/task.rs:426:45
thread 'main' panicked at 'task has failed', /home/tgnottingham/.cargo/registry/src/github.com-1ecc6299db9ec823/async-task-4.3.0/src/task.rs:426:45

The issue doesn't occur when using cacache::write.

Appears to be similar to #32.

Running on Ubuntu 22.04.1, Linux kernel 5.15.0-46-generic, x86_64, cacache 10.0.1.

New bucket serialization format

The current bucket format is copied directly from what the JavaScript version of cacache does.

I no longer think it's worth trying to preserve compatibility, and the performance of index-related operations is kind of horrendous right now, so I think it's time to explore a new on-disk format for the index buckets.

My current thinking is to use serde more directly, and come up with a better strategy for the generic metadata field, as well.

And of course, if there's no actual perf difference, this issue should just be closed, but this is worth exploring anyway.

Consider revising the repository's about description

Heya! This just came across my GitHub following feed and it looks really interesting, but I must admit I almost ignored it at first because of the about description:

๐Ÿ’ฉ๐Ÿ’ต but for your ๐Ÿฆ€

It's fun, but it's not particularly informative (and honestly, it sounded like a shitpost-as-repo). I'm glad I checked it out, though, because

A high-performance, concurrent, content-addressable disk cache, optimized for async APIs.

is much more interesting as a driveby viewer!

Apologies if it sounds like I'm telling you how to run your repo - I just don't want other people to make the same mistake I almost did ๐Ÿ˜…

Deadlock when accessing across two threads.

Hi, we are using the crate through reqwest-cache. The crate seems to perform the standard read operation found here. I have an example of the cache reading here that works in the manner the code is written because of the select and pin https://github.com/spider-rs/spider/blob/main/examples/cache.rs. If we change the code to remove the select and spawn tasks instead the subscription will hang forever.

I could post the deadlock here in a code example if needed.

Feature "tokio" does not compile with rustc 1.77.1

Hi. Trying to build project with rustc 1.77.1 and tokio feature enabled outputs multiple compile errors.

  --> /Users/pablo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-io-0.3.30/src/lib.rs:60:12

error[E0308]: mismatched types
  --> /Users/pablo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-13.0.0/src/get.rs:46:9
   |
45 |     ) -> Poll<tokio::io::Result<()>> {
   |          --------------------------- expected `Poll<std::result::Result<(), std::io::Error>>` because of return type
46 |         Pin::new(&mut self.reader).poll_read(cx, buf)
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `Poll<Result<(), Error>>`, found `Poll<Result<usize, Error>>`
   |
   = note: expected enum `Poll<std::result::Result<(), _>>`
              found enum `Poll<std::result::Result<usize, _>>`

error[E0599]: no method named `poll_shutdown` found for struct `Pin<&mut AsyncWriter>` in the current scope
   --> /Users/pablo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-13.0.0/src/put.rs:170:36
    |
170 |         Pin::new(&mut self.writer).poll_shutdown(cx)
    |                                    ^^^^^^^^^^^^^ method not found in `Pin<&mut AsyncWriter>`
    |
    = help: items from traits can only be used if the trait is implemented and in scope
    = note: the following trait defines an item `poll_shutdown`, perhaps you need to implement it:
            candidate #1: `tokio::io::AsyncWrite`

As a result I cannot use versions 0.13.0 with my project, it still works with 0.12.0. Could you look into resolving it?

Validation of documented system properties

Among the claims you make, are: "Fault tolerance (immune to corruption, partial writes, process races, etc)" and "Consistency guarantees on read and write (full data verification)". These two claims stand out to me, but what I'm curious about also applies to the other claims. What kind of verification have you done, and keep doing to ensure these claims? As Kyle Kingsbury can certainly attest, many distributed systems make claims in a similar vain, but many don't uphold what they promise. I'd appreciate if you explain your taken steps for validating such claims not only here but also in the README. So future readers can more easily evaluate the trust they place in this software.

benchmarks: add more benches

It would be really nice to have more thorough benchmark coverage of the various external APIs, much like cacache-js does.

Examples

This project looks promising. However, lack of examples makes it difficult to get started. Add some examples

Cannot delete fully index entry if content file doesn't exist.

If we try to remove an index entry or content entry when it's already deleted, cacache throw an io::ErrorKind::NotFound. This is fine, but when using RemoveOpts and remove_fully, a missing content file will throw an early error and prevent the deletion of the bucket.

A solution would be to simply ignore NotFound errors, as it is already in the intended final state.

Fully deletion does not delete the value

Description

When make remove_fully to true, it's only delete the key, but not the value.

Steps to reproduce

I write a code to create a random key and value, the result of cache dir is:

With remove_fully to false

$ rg "137-" my-cache/

my-cache/content-v2/sha256/ce/fd/02dcb440266abebb725a81019707a40f0f82bcf6c6f2dff2ca21480eb0a8
1:137-some data

my-cache/index-v5/8b/1e/f744c40e0577aced59824ae6d9dcb05ff399
2:3d32862bb4beb9a748577804b2aa8c473e40b9a322ea6bc05948816421e39c07	{"key":"137-key","integrity":"sha256-DQxf03Cgxhi/gMg4NOVHBt9J/C0SLsja7/IhebII74k=","time":1706346133948,"size":13,"metadata":null,"raw_metadata":null}
3:cf1c942e7d568a97c7259a8d6e35d31958f814d0ce8ec7cb94deb997e6b3ffb9	{"key":"137-key","integrity":null,"time":1706346134097,"size":0,"metadata":null,"raw_metadata":null}
4:9eb741017218ae0a02038e916eeea4bc54667e840c68ed0649deac285ca5864d	{"key":"137-key","integrity":"sha256-zv0C3LRAJmq+u3JagQGXB6QPD4K89sby3/LKIUgOsKg=","time":1706346305649,"size":13,"metadata":null,"raw_metadata":null}
5:ea842e6e8787ebc0da77f8c29bd4f867d12adc3fb824d374adeaf6b1a6c7c73e	{"key":"137-key","integrity":null,"time":1706346305682,"size":0,"metadata":null,"raw_metadata":null}

With remove_fully to true

$ rg "137-" my-cache/

my-cache/content-v2/sha256/ce/fd/02dcb440266abebb725a81019707a40f0f82bcf6c6f2dff2ca21480eb0a8
1:137-some data

Expected result

When remove_fully is true, I expect the value to be deleted too.

Eviction?

I'm looking at different caching libraries right now for my project and this one looks really cool! However, I cannot find any information on cache eviction. Do I need to implement it manually on top of cacache? Have others already done it? (Is this even possible?)

What I need is some simple (access) time eviction, but I think most use cases require bounded size and some LRU/LFU algorithm.

"raw_metadata" is not raw after serialization

Under "cache/index-v5/xx/yy/zzz", I found something like "raw_metadata":[0,18,32,112,97,......]"

It is the serialization of Vec, rather than binary like format.

Is this intentional? It looks like not very efficient.

cacache-11.5.2 crashes with SIGBUS when disk full and data size <= 512KB

cacache-11.5.2 crashes a program with SIGBUS when working under disk-full conditions on linux.
Running ubuntu-19.10 with linux-5.3.0-64-generic.

$ rustc --version
rustc 1.69.0 (84c898d65 2023-04-16)

Reproduction:

  • create a filesystem with limited space, for example mkdir /tmp/ram && mount -ttmpfs -osize=5m tmpfs /tmp/ram
  • run the following program:
    fn main() {
        for i in 0..12 {
            println!("{}", i);
            let data: Vec<_> = (0..512*1024).map(|_| rand::random::<u8>()).collect();
            println!("{:?}", cacache::write_hash_sync("/tmp/ram/cache", &data));
        }
    }
    
    // in Cargo.toml
    // ...
    // [dependencies]
    // cacache = "11.5.2"
    // rand = "0.8"
  • result:
    0
    Writer::new size=Some(524288)
    Ok(Integrity { hashes: [Hash { algorithm: Sha256, digest: "QQ9CVmHX6CzNPkuGFAhp/k8wSkmEVexMp6ARULmLdMM=" }] })
    1
    Writer::new size=Some(524288)
    Ok(Integrity { hashes: [Hash { algorithm: Sha256, digest: "JAI/yZ2LjUfpKko8L4RFV7g7DzNxHvq7jfYhX/9mQ4o=" }] })
    [...]
    9
    Writer::new size=Some(524288)
    Bus error (core dumped)
    

The problem is caused by the optimization where if the binary blob to cache is no more than 512KB, then it is written via mmap. Since the file obtained from tempfile may be sparse, writing to it may result in allocation of more blocks on the fs and failure with SIGBUS (I'm not sure this is standard/defined behavior or not, it seems legit though - what else can the OS do?). The call to std::fs::File::set_len only results in calling the truncate syscall, which does not guarantee file allocation and does not return an error on not enough space on device.

Calling posix_fallocate on the fd of the file obtained from tempfile fixes the issue for me, but I am not sure posix_fallocate guarantees file allocation or it just happens to work ok.

Compilation error

I'm trying to use only the sync api, because I don't need the async one. So I don't want to pull the async runtime, and I copied the following content in my Cargo.toml, as the Docs.rs page said:
cacache = { version = "12.0.0", default-features = false, features = ["mmap"] }
And I got the following error log:

error[E0433]: failed to resolve: could not find `async_lib` in the crate root
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/index.rs:426:20
    |
426 |             crate::async_lib::remove_file(&bucket)
    |                    ^^^^^^^^^ could not find `async_lib` in the crate root

error[E0425]: cannot find function `find_async` in module `index`
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/get.rs:329:37
    |
329 |         if let Some(entry) = index::find_async(cache, key).await? {
    |                                     ^^^^^^^^^^ not found in `index`
    |
note: found an item that was configured out
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/index.rs:179:14
    |
179 | pub async fn find_async(cache: &Path, key: &str) -> Result<Option<Metadata>> {
    |              ^^^^^^^^^^

error[E0425]: cannot find function `find_async` in module `index`
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/get.rs:365:37
    |
365 |         if let Some(entry) = index::find_async(cache, key).await? {
    |                                     ^^^^^^^^^^ not found in `index`
    |
note: found an item that was configured out
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/index.rs:179:14
    |
179 | pub async fn find_async(cache: &Path, key: &str) -> Result<Option<Metadata>> {
    |              ^^^^^^^^^^

error[E0425]: cannot find function `open_async` in this scope
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/content/read.rs:166:22
    |
166 |     let mut reader = open_async(cache, sri.clone()).await?;
    |                      ^^^^^^^^^^ not found in this scope

error[E0433]: failed to resolve: use of undeclared type `AsyncReadExt`
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/content/read.rs:169:20
    |
169 |         let read = AsyncReadExt::read(&mut reader, &mut buf)
    |                    ^^^^^^^^^^^^ use of undeclared type `AsyncReadExt`

error[E0425]: cannot find function `delete_async` in this scope
   --> /home/kinire98/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cacache-12.0.0/src/index.rs:423:13
    |
423 |             delete_async(cache.as_ref(), key.as_ref()).await
    |             ^^^^^^^^^^^^ not found in this scope

Some errors have detailed explanations: E0425, E0433.
For more information about an error, try `rustc --explain E0425`.
error: could not compile `cacache` (lib) due to 6 previous errors

This is the neofetch information of my PC if it is useful:

                     ./o.                  
                   ./sssso-                ------------------ 
                 `:osssssss+-              OS: EndeavourOS Linux x86_64 
               `:+sssssssssso/.            Host: GF63 Thin 10SCSR REV:1.0 
             `-/ossssssssssssso/.          Kernel: 6.6.10-arch1-1 
           `-/+sssssssssssssssso+:`        Uptime: 4 hours, 36 mins 
         `-:/+sssssssssssssssssso+/.       Packages: 1085 (pacman), 8 (snap) 
       `.://osssssssssssssssssssso++-      Shell: bash 5.2.21 
      .://+ssssssssssssssssssssssso++:     Resolution: 1920x1080, 1920x1080 
    .:///ossssssssssssssssssssssssso++:    DE: Xfce 4.18 
  `:////ssssssssssssssssssssssssssso+++.   WM: Xfwm4 
`-////+ssssssssssssssssssssssssssso++++-   WM Theme: Default 
 `..-+oosssssssssssssssssssssssso+++++/`   Theme: Adwaita-dark [GTK2], Arc-Dark [GTK3] 
   ./++++++++++++++++++++++++++++++/:.     Icons: Qogir-dark [GTK2/3] 
  `:::::::::::::::::::::::::------``       Terminal: xfce4-terminal 
                                           Terminal Font: Source Code Pro 10 
                                           CPU: Intel i7-10750H (12) @ 5.000GHz 
                                           GPU: Intel CometLake-H GT2 [UHD Graphics] 
                                           GPU: NVIDIA GeForce GTX 1650 Ti Mobile 
                                           Memory: 4150MiB / 31917MiB 

I don't know if I have to change something for it to work. If I just write:
cacache = "12.0.0"
works normally, but imports the whole async runtime.

Remove Key and clean the value

The documentation of cacache::remove() says this: "Removes an individual index metadata entry. The associated content will be left in the cache".

Is there a safe way to remove the key and the content? Or is automatically done at one point?

Is cacache checking for hash collisions?

One thing missing from the readme is whether cacache checks for hash collisions, since hashes seems to be the way the contents is indexed internally. While they are rare, that's definitely something to be worried about

So is cacache checking for those or do we have to manually check it ourselves?

Some variants for copy/reflink/hard-link are missed

For copy/reflink/hard-link, there should be

  • xx
  • xx_unchecked
  • xx_hash
  • xx_hash_unchecked

And their sync variant version. So there shuold be 8 methods for each category.

However, some methods are missed:

  • reflink_hash_unchecked
  • hard_link_hash
  • hard_link_unchecked
  • hard_link_hash_unchecked

Seems reflink_hash_unchecked hard_link_hash_unchecked and hard_link_unchecked are not very useful.

hard_link_hash is missed anyway.

New HashMap-like API

Thanks to some feedback from a couple of folks, including @yoshuawuyts, I think it's worth exploring a new API that's more "object-oriented". The following is a sketch, partly based on Yoshua's suggestions, but adapted to more specific needs of cacache features:

struct Cache {
  path: Path
};

// Assume any necessary AsRefs below. Omitting them for the sake of readability.
impl Cache {
    fn new(path: Path) -> Self;
    async fn open(&self, key: String) -> Result<AsyncGet>;
    async fn insert(&mut self, key: String, value: &[u8]) -> Result<Integrity>;
    async fn get(&self, key: String) -> Result<Option<&[u8]>>;
    async fn get_entry(&self, key: String) -> Result<Option<Entry>>;
    async fn get_by_hash(&self, sri: &Integrity) -> Result<Option<&[u8]>>;
    async fn contains_hash(&self) -> bool;

    // Same as the above, but sync, suffixed by `_sync`
    fn insert_sync(&mut self, key: String, value: &[u8]) -> Result<Integrity>;
    // etc

    // Iterate over all entries
    async fn entries(&self) -> Stream<Entry>;
}

// Stream over all entries.
impl Stream for Entries {
    type Item = Entry;
}

struct Entry;

impl Entry {
    // getters for all fields

    // Similar to ptr::copy_to
    async fn copy_to(&self, Path) -> io::Result;
}

impl std::hash::Hash for Entry;

Bump to stable async-std and futures!

Before releasing 6.0, we should make sure to switch over to the stable version of the async ecosystem, once both async-std and futures crates have been updated to work with stable Rust and async/await.

This should be in the next day or two!

docs: Add more detail to docs; write doctests

The documentation, while fairly "complete", is missing doc tests, and more detailed explanations of how to use cacache. Writing some more examples would be really helpful, and doctests for all the various API functions in their respective pages.

feature request: tokio::io::AsyncSeek support for Reader

Use case: I have zip files being added to a cacache instance, and zip is a non-streamable format - the index of a zip is at the end, so it is somewhat inefficient to read everything, then scan the index, then reopen and re-read to extract the actual content (using async_zip)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.