Code Monkey home page Code Monkey logo

cargo-fetcher's Introduction

🎁 cargo-fetcher

Embark Embark Crates.io Docs dependency status Build Status

Alternative to cargo fetch for use in CI or other "clean" environments that you want to quickly bootstrap with the necessary crates to compile/test etc your project(s).

Why?

  • You run many CI jobs in clean and/or containerized environments and you want to quickly fetch cargo registries and crates so that you can spend your compute resources on actually compiling and testing the code, rather than downloading dependencies.

Why not?

  • Other than the fs storage backend, the only supported backends are the 3 major cloud storage backends, as it is generally beneficial to store crate and registry information in the same cloud as you are running your CI jobs to take advantage of locality and I/O throughput.
  • cargo-fetcher should not be used in a typical user environment as it completely disregards various safety mechanisms that are built into cargo, such as file-based locking.
  • cargo-fetcher assumes it is running in an environment with high network throughput and low latency.

Supported Storage Backends

gcs

The gcs feature enables the use of Google Cloud Storage as a backend.

  • Must provide a url to the -u | --url parameter with the gsutil syntax gs://<bucket_name>(/<prefix>)?
  • Must provide GCP service account credentials either with --credentials or via the GOOGLE_APPLICATION_CREDENTIALS environment variable

s3

The s3 feature enables the use of Amazon S3 as a backend.

  • Must provide a url to the -u | --url parameter, it must of the form http(s)?://<bucket>.s3(-<region>).<host>(/<prefix>)?
  • Must provide AWS IAM user via the environment AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY described here or run from an ec2 instance with an assumed role as described here.

fs

The fs feature enables use of a folder on a local disk to store crates to and fetch crates from.

  • Must provide a url to the -u | --url parameter with the file: scheme

blob

The blob feature enables the use of Azure Blob storage as a backend.

  • Must provide a url to the -u | --url parameter, it must of the form blob://<container_name>(/<prefix>)?
  • Must provide Azure Storage Account via the environment variables STORAGE_ACCOUNT and STORAGE_MASTER_KEY described here.

Examples

This is an example from our CI for an internal project.

Dependencies

  • 424 crates.io crates: cached - 38MB, unpacked - 214MB
  • 13 crates source from 10 git repositories: db - 27MB, checked out - 38MB

Scenario

The following CI jobs are run in parallel, each in a Kubernetes Job running on GKE. The container base is roughly the same as the official rust:1.39.0-slim image.

  • Build modules for WASM
  • Build modules for native
  • Build host client for native

~ wait for all jobs to finish ~

  • Run the tests for both the WASM and native modules from the host client

Before

All 3 build jobs take around 1m2s each to do cargo fetch --target x86_64-unknown-linux-gnu

After

All 3 build jobs take 3-4s each to do cargo fetcher --include-index sync.

Usage

cargo-fetcher has only 2 subcommands. Both of them share a set of options, the important inputs for each backend are described in Storage Backends.

In addition to the backend specifics, the only required optional is the path to the Cargo.lock lockfile that you are operating on. cargo-fetcher requires a lockfile, as otherwise the normal cargo work of generating a lockfile requires having a full registry index locally, which partially defeats the point of this tool.

-l, --lock-file <lock-file>
    Path to the lockfile used for determining what crates to operate on [default: Cargo.lock]

mirror

The mirror subcommand does the work of downloading crates and registry indexes from their original locations and re-uploading them to your storage backend.

It does have one additional option however, to determine how often it should take snapshots of the registry index(es).

-m, --max-stale <max-stale>
    The duration for which the index will not be replaced after its most recent update.

    Times may be specified with no suffix (default days), or one of:
    * (s)econds
    * (m)inutes
    * (h)ours
    * (d)ays

Custom registries

One wrinkle with mirroring is the presence of custom registries. To handle these, cargo fetcher uses the same logic that cargo uses to locate .cargo/config<.toml> config files to detect custom registries, however, cargo's config files only contain the metadata needed to fetch and publish to the registry, but the url template for where to download crates from is actually present in a config.json file in the root of the registry itself.

Rather than wait for a registry index to be downloaded each time before fetching any crates sourced that registry, cargo-fetcher instead allows you to specify the download location yourself via an environment variable, that way it can fully parallelize the fetching of registry indices and crates.

Example

# .cargo/config.toml

[registries]
embark = { index = "<secret url>" }

The environment variable is of the form CARGO_FETCHER_<name>_DL where name is the same name (upper-cased) of the registry in the configuration file.

CARGO_FETCHER_EMBARK_DL="https://secret/rust/cargo/{crate}-{version}.crate" cargo fetcher mirror

The format of the URL should be the same as the one in your registry's config.json file, if this environment variable is not specified for your registry, the default of /{crate}/{version}/download is just appended to the url of the registry.

sync

The sync subcommand is the actual replacement for cargo fetch, except instead of downloading crates and registries from their normal location, it downloads them from your storage backend, and splats them to disk in the same way that cargo does, so that cargo won't have to do any actual work before it can start building code.

Contributing

Contributor Covenant

We welcome community contributions to this project.

Please read our Contributor Guide for more information on how to get started.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

cargo-fetcher's People

Contributors

cosmicexplorer avatar dependabot-preview[bot] avatar dependabot[bot] avatar jake-shadle avatar jelmansouri avatar lpil avatar repi avatar sbcd90 avatar soniasingla avatar xampprocky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cargo-fetcher's Issues

Figure out missing pieces

Running cargo fetch immediately after cargo fetcher sync still results in several seconds of work on Linux, when it should be basically nothing. There is probably something missing during a sync that cargo relies on for a "freshness" check, but I haven't found a likely culprit. This is far worse on Windows however, which is the bigger problem.

Cleanup README

Just realized the README is super out of date, should clean it up before a release, so adding this issue so I don't forget.

Support custom registries

Currently custom registries are not supported at all, and only crates.io is mirrored/fetched. We use a custom registry already which already non-fatal errors during mirroring due to this, so would be good to fix.

Update crate location for 1.70.0

1.70.0 made cargo's sparse protocol the default, which has a different disk location than the github crates.io index, this means that running a cargo fetch after cargo fetcher will redo the work for all of the crates from crates.io, though much faster now due to the sparse protocol, luckily crates source from git are located elsewhere so those aren't affected.

Update to tokio 1.0 ecosystem

Now that tokio 1.0 is out, and hyper/reqwest have been updated, it would be good to move to them, especially now that it is easier due to #161 has been merged, and the only holdout is azure.

Windows problems with unpacking git repos

[2019-07-26T17:27:46Z INFO  cargo_fetcher::cmds::sync] syncing crates.io index
[2019-07-26T17:27:46Z INFO  cargo_fetcher::sync] synchronizing 437 crates...
[2019-07-26T17:27:46Z INFO  cargo_fetcher::sync] skippipng crates.io-index download, index repository already present
[2019-07-26T17:27:46Z INFO  cargo_fetcher::cmds::sync] successfully synced crates.io index
[2019-07-26T17:27:46Z INFO  cargo_fetcher::sync] checking local cache for missing crates...
[2019-07-26T17:27:46Z INFO  cargo_fetcher::sync] synchronizing 434 missing crates...
[2019-07-26T17:27:49Z ERROR cargo_fetcher::sync] failed to unpack dependency cpal-0.9.0(git): failed to unpack: failed to unpack `C:\Users\ContainerAdministrator\.cargo\git/db\cpal-a7ffd7cabefac714\objects\pack\pack-c8fae354ebdeace6800253b30ea8fa1608b132bf.idx`
[2019-07-26T17:27:54Z ERROR cargo_fetcher::sync] failed to unpack dependency crossbeam-utils-0.6.5(git): failed to unpack: failed to unpack `C:\Users\ContainerAdministrator\.cargo\git/db\crossbeam-5d5b005504a37dac\objects\pack\pack-60c51cb402b51a9bfcaa68827fb9802a1b6a4869.pack`
[2019-07-26T17:27:56Z ERROR cargo_fetcher::sync] failed to unpack dependency lmdb-sys-0.8.0(git): failed to unpack: failed to unpack `C:\Users\ContainerAdministrator\.cargo\git/db\lmdb-rs-958662b5696e3642\objects\pack\pack-ef9004cd65ff43596b2e7fd12207adebb3518225.pack`
[2019-07-26T17:28:22Z INFO  cargo_fetcher::cmds::sync] finished syncing crates

> cargo fetch
    Updating git repository `https://github.com/EmbarkStudios/cpal`
    Updating git repository `https://github.com/crossbeam-rs/crossbeam`
    Updating git repository `https://github.com/EmbarkStudios/lmdb-rs.git`

Include submodules

Currently, submodules are not included in git snapshots, so cargo will retrieve them itself during its own fetch, these should be included during mirroring as they can be a significant contributor to download time.

reached the type-length limit while instantiating `tokio::runtime::Handle::enter::<...()}]>>)]>, ()}]>}]>, ()}]>], ()>`

Describe the bug

   Compiling cargo-fetcher v0.9.0
error: reached the type-length limit while instantiating `tokio::runtime::Handle::enter::<...()}]>>)]>, ()}]>}]>, ()}]>], ()>`
  --> /home/amnesia/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.22/src/runtime/handle.rs:72:5
   |
72 | /     pub fn enter<F, R>(&self, f: F) -> R
73 | |     where
74 | |         F: FnOnce() -> R,
75 | |     {
76 | |         context::enter(self.clone(), f)
77 | |     }
   | |_____^
   |
   = note: consider adding a `#![type_length_limit="2014460"]` attribute to your crate

error: aborting due to previous error

error: failed to compile `cargo-fetcher v0.9.0`, intermediate artifacts can be found at `/tmp/cargo-installoI3KJo`

Caused by:
  could not compile `cargo-fetcher`.

Not much to explain, the rust compiler did a great job at it.
This is the case on my Linux laptop and windows work machine.

To Reproduce

  1. cargo install cargo-fetcher

Expected behavior
cargo-fetcher installs.

Device:
1.

  • ThinkPad T495s
  • Manjaro Linux
  • Linux Kernel: 5.8.6.1-MANJARO
  • Cargo: 1.46.0
  • Rustc: 1.46.0 (04488afe3 2020-08-24)
  • Rustup: 1.22.1 (b01adbbc3 2020-07-08)
  • Wierd old Computer at Work
  • Windows 10
  • Cargo: ?
  • Rustc: ?
  • Rustup: ?

Additional context
Sorry if this issue sucks, I'm just not very sure what to put else here.

Add .cache entries for registry indices

Cargo has an optimization where it doesn't actually do a checkout of the registry, but it then needs to read the crate entries from the blob, which is a bit slower than just reading a file on disk, so it also generates .cache entries which it uses instead. I think this may be the missing piece for #16 for why doing a fetch immediately after using cargo-fetcher can still take multiple seconds, when it should actually be essentially a no-op.

Do git checkout

For git dependencies, cargo uses a CARGO_HOME/git/db for bare clones of the git repo, and then uses CARGO_HOME/git/checkouts for the actual check outs used as sources when compiling.

Currently cargo-fetcher only does the bare clone portion leaving the checkout to cargo, which (I believe) does check outs one at a time, so this is probably another opportunity to reduce the total fetch time.

Unclaimed S3 bucked in util.rs file

Describe the bug
While checking the github I found there is use of Unclaimed S3 Bucket name "johnsmith.net" is unclaimed and also it is used in the util.rs file.

To Reproduce
Steps to reproduce the behavior:
Create an s3 bucket with the name and region.
Upload files with the name same as written in the .rs file (e.g.Β johnsmith.netΒ Β )
Make the settings and change it to a static website

Screenshots
http://johnsmith.net.s3.amazonaws.com/index.html

Code where we find:

let url = Url::parse("http://johnsmith.net.s3.amazonaws.com/homepage.html").unwrap();

Additional context
Perform proper source code review and check for unclaimed s3 buckets used in the code.

Dependabot can't resolve your Rust dependency files

Dependabot can't resolve your Rust dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

error: failed to parse manifest at `/home/dependabot/dependabot-updater/dependabot_tmp_dir/Cargo.toml`

Caused by:
  feature `profile-overrides` is required

this Cargo does not support nightly features, but if you
switch to nightly channel you can add
`cargo-features = ["profile-overrides"]` to enable this feature

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

Add example using minio S3 locally

Would be really neat as it is easy to set up minio on anything and it has S3 compatible API, so if we can mirror and sync from a minio S3 bucket that would be great for local and offline development and testing as well as potentially for CI

Replace bulky auto-generated crates

Right now both s3 and azure pull in a ridiculous amount of code and dependencies (often outdated!) for what essentially amounts to some authentication and a few HTTP calls, should try to replace them with something simpler like what the tame-gcs is made to be, a thin sans-io interface.

Convert to async

Would be fun and good to convert the blocking http downloads to async/await and latest tokio here. May be a bit faster than the threadpool also but hopefully slightly simpler code also, and a good way to learn how to use the new async/await ecosystem

Get rid of async

While async is nice in some cases, it really just gets in the way, particularly for making everything truly parallel instead of just concurrent, I think we could get better throughput by just using a blocking reqwest client and using rayon for parallelization.

S3 support

S3 support would be nice to have in addition to GCS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.