Code Monkey home page Code Monkey logo

crates-index-diff-rs's Introduction

Rust crates.io version

A library to easily retrieve changes between different revisions of the crates.io index.

It will only need a bare clone, which saves resources.

Usage

Add this to your Cargo.toml

[dependencies]
crates-index-diff = "9"

Notes…

…about collapsing of the crates.io history

Usually every 6 months the crates.io index repository's history is collapse for improved performance. This library handles that case gracefully.

crates-index-diff-rs's People

Contributors

afonsojramos avatar alanhdu avatar byron avatar jyn514 avatar l4l avatar nemo157 avatar onur avatar pascalkuthe avatar quietmisdreavus avatar syphar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crates-index-diff-rs's Issues

Add option to disable serde

About half of the dependencies of crates-index-diff come from serde, serde_derive, and serde_json. It looks like serde_json is used for parsing the diff, so maybe it's not get feasible to get rid of that, but it would be nice to make serde and serde_derive optional.

crates-index-diff 0.11.2 had breaking changes

crates-index-diff 0.11.2 bumped git-repository to major version 0.24, but since Index::repository returns git_repository::Repository, this change is backwards-incompatible.

In particular, I hit this in a crate with:

[dependencies]
crates-index-diff = "0.11"
git-repository = "0.23"
fn count_versions(rep: &git_repository::Repository) { ... }
fn main() {
  let index = crates_index_diff::...;
  count_versions(index.repository());
}

which failed to build after running cargo update with:

error[E0308]: arguments to this function are incorrect
   --> src/incremental.rs:339:18
    |
339 |         exist += count_versions(index.repository(), &e)?;
    |                  ^^^^^^^^^^^^^^ ------------------  -- expected struct `git_repository::object::tree::EntryRef`, found struct `git_repository::object::tree::iter::EntryRef`
    |                                 |
    |                                 expected struct `git_repository::Repository`, found struct `git_repository::types::Repository`
    |
    = note: expected reference `&git_repository::Repository`
               found reference `&git_repository::types::Repository`
    = note: perhaps two different versions of crate `git_repository` are being used?
    = note: expected reference `&git_repository::object::tree::EntryRef<'_, '_>`
               found reference `&git_repository::object::tree::iter::EntryRef<'_, '_>`
    = note: perhaps two different versions of crate `git_repository` are being used? 

Configurable index location

To support usecases like rust-lang/docs.rs#767 it would be useful to be able to specify the upstream url to be used in a function like Index::from_path_or_cloned.

The biggest unknown I see is what to do if it doesn't match. Currently if you setup a clone with the correct upstream remote name it will be just used, even if that remote doesn't match the one that would be cloned. I think with this function it would make more sense to verify that, and either change it or error if it is different.

(This could also be handled relatively easily on the docs.rs side by doing the git operations ourselves, if you think this is niche enough to not support).

crates-index-diff 0.11.1 produces `Added`, not `Yanked` for immediately-yanked crate version

I'm running crates-index-diff 0.11.1 over an index where the first commit is empty, and the second commit is

diff --git a/al/lo/allowed b/al/lo/allowed
new file mode 100644
index 0000000..b30662b
--- /dev/null
+++ b/al/lo/allowed
@@ -0,0 +1 @@
+{"name":"allowed","vers":"1.0.0","deps":[],"features":{},"cksum":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","yanked":true}

This used to produce a Changed::Yanked, but with 0.11.1 it produces Changed::Added instead. Which isn't specifically wrong, but it is a change in behavior I didn't expect and isn't in the changelog. I honestly don't know which it should produce. Maybe both, one immediately following the other? Figured I should at least document the change in behavior, and perhaps we should add a test for it?

Cancelled error on invalid index entries

crates-index-diff now yields DiffForEach(Diff(Cancelled)) error on seemingly-okay git history (version 12 worked fine):

$ git init repo
$ mkdir -p repo/aw/s-
$ echo '{"name":"aws-foo","vers":"0.0.1","deps":[],"cksum":"","features":{},"yanked":false}' > repo/aw/s-/aws-foo
$ git -C repo add .
$ git -C repo commit -m 'commit'
# note down the commit hash here
$ echo '{"name":"aws-foo","vers":"0.0.2","deps":[],"cksum":"","features":{},"yanked":false}' >> repo/aw/s-/aws-foo
$ git -C repo add .
$ git -C repo commit -m 'commit'
# note down this commit hash too
$ git -C repo remote add origin origin
$ git -C repo diff <first commit hash>..<second commit hash>
diff --git a/aw/s-/aws-foo b/aw/s-/aws-foo
index f512d82..9bff708 100644
--- a/aw/s-/aws-foo
+++ b/aw/s-/aws-foo
@@ -1 +1,2 @@
 {"name":"aws-foo","vers":"0.0.1","deps":[],"cksum":"","features":{},"yanked":false}
+{"name":"aws-foo","vers":"0.0.2","deps":[],"cksum":"","features":{},"yanked":false}
$ cargo new --bin whatever && cd whatever
$ echo 'crates-index-diff = "15"' >> Cargo.toml
$ cat src/main.rs
use crates_index_diff::{git, index::CloneOptions, Index};
use std::sync::atomic::AtomicBool;

fn main() {
    let repo = git::open("../repo").unwrap();
    let origin_url = repo
        .find_remote("origin")
        .unwrap()
        .url(git::remote::Direction::Fetch)
        .unwrap()
        .to_bstring()
        .to_string();
    let index = Index::from_path_or_cloned_with_options(
        "../repo",
        git::progress::Discard,
        &AtomicBool::default(),
        CloneOptions {
            url: origin_url,
            ..Default::default()
        },
    )
    .unwrap();

    index.changes("<first commit hash>", "<second commit hash>").unwrap();
}
$ cargo r
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DiffForEach(Diff(Cancelled))', src/main.rs:24:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The origin_url bit is to ensure we always use the existing path rather than ever try to clone. At least that was the intent, and that worked back in version 12. Don't know if it's relevant here.

Run `git gc` occasionally

In docs.rs, we just found we had very high CPU and IO usage because we had never run git gc. It would be great if crates-index-diff ran this automatically, the same way that git runs git gc automatically once in a while.

See also rust-lang/docs.rs#778.

Unexpected `Deleted` event for crate "dl"

With this program:

use crates_index_diff::{Change, Index};

fn main() {
    let index = Index::from_path_or_cloned("crates.io-index").unwrap();
    let changes = index
        .changes(
            "feeacd1f399c353d020e7b9e0ddaf4b0628f9476",
            "c3d8f6d10335e8eb076dd594ec6d82e3f58ba24a",
        )
        .unwrap();
    for change in changes {
        match &change {
            Change::Added(cv) if cv.name == "dl" => {}
            Change::Yanked(cv) if cv.name == "dl" => {}
            Change::Deleted { name, .. } if name == "dl" => {}
            _ => continue,
        }

        eprintln!("{:?}", change);
    }
}

I get the result:

$ git clone "https://github.com/rust-lang/crates.io-index.git"
$ cargo run --release
...
     Running `target/release/deleted-unexpectedly`
Deleted { name: "dl" }

However, dl is present in both commits (in from / in to) with no changes in between.

New publish concurrent with yanks is missed

From investigation in rust-lang/docs.rs#1912 it looks like this is a crates-index-diff issue. The crate had a publish and two yanks in between two checks and we only see the yank event.

The relevant commit range contains just this publish and the two yanks:

> git log --oneline --reverse b49672ff6a2d40123a593cfbca9d05219346e398~1..92c18bdf30a4872d355e4a5b3a7f7c6c75323cf7
b49672ff6a2 Updating crate `aegis#0.2.4`
da97cd0243b Updating crate `ansi-color-codec#0.3.11`
1533f8e863a Yanking crate `ansi-color-codec#0.3.4`
92c18bdf30a Yanking crate `ansi-color-codec#0.3.5`

I wrote a little test program to verify this:

use crates_index_diff::{Change, CrateVersion};

fn main() -> anyhow::Result<()> {
    let index = crates_index_diff::Index::from_path_or_cloned("index")?;
    let mut args = std::env::args().skip(1);
    let start = git_hash::ObjectId::from_hex(args.next().unwrap().as_bytes())?;
    let end = git_hash::ObjectId::from_hex(args.next().unwrap().as_bytes())?;
    for change in index.changes_between_commits(start, end)? {
        match change {
            Change::Added(CrateVersion { name, version, .. }) => println!("added {name} {version}"),
            Change::Yanked(CrateVersion { name, version, .. }) => println!("yanked {name} {version}"),
            Change::Deleted { name } => println!("deleted {name}"),
        }
    }
    Ok(())
}

For the full range it shows the same behaviour:

> cargo run --quiet --release -- b49672ff6a2d40123a593cfbca9d05219346e398 92c18bdf30a4872d355e4a5b3a7f7c6c75323cf7
yanked ansi-color-codec 0.3.4
yanked ansi-color-codec 0.3.5

If I only include the first one or two commits it behaves correctly:

> cargo run --quiet --release -- b49672ff6a2d40123a593cfbca9d05219346e398 da97cd0243bd297e7b0e4b11040e531af2e256d1
added ansi-color-codec 0.3.11

> cargo run --quiet --release -- b49672ff6a2d40123a593cfbca9d05219346e398 1533f8e863aee3fd5340513acfeef9d42816cd08
yanked ansi-color-codec 0.3.4
added ansi-color-codec 0.3.11

crates-index-diff 0.11 and zlib-ng-compat

I noticed that 0.11 (probably accidentally) enables the zlib-ng feature of libz-sys without a way to turn that off. Specifically:

crates-index-diff takes a dependency on git-repository with default features here:

git-repository = "0.23.0"

git-repository has the max-performance feature by default, which includes max-performance-safe, which enables git-features/zlib-ng-compat:

https://github.com/Byron/gitoxide/blob/85a3bedd68d2e5f36592a2f691c977dc55298279/git-repository/Cargo.toml#L70

git-features in turn has that feature forward to flate2/zlib-ng-compat here:

https://github.com/Byron/gitoxide/blob/85a3bedd68d2e5f36592a2f691c977dc55298279/git-features/Cargo.toml#L45

which in turn enables the zlib-ng feature of libz-sys

Was this enable-without-opt-out intentional? If not, would you be okay with making this feature optional in crates-index-diff as well? I can submit a PR if that'd be helpful!

compile errors when adding 15.0.0 as dependency

Trying to upgrade crates-index-diff on docs.rs I got some compile errors:

error[E0425]: cannot find function `serialize` in crate `hex`
   --> /Users/syphar/.cargo/registry/src/github.com-1ecc6299db9ec823/crates-index-diff-15.0.0/src/types.rs:125:26
    |
125 | #[derive(Default, Clone, serde::Serialize, serde::Deserialize, Eq, PartialEq, Debug)]
    |                          ^^^^^^^^^^^^^^^^ not found in `hex`
    |
    = note: this error originates in the derive macro `serde::Serialize` (in Nightly builds, run with -Z macro-backtrace for more info)
error[E0277]: the trait bound `SmartString<LazyCompact>: Serialize` is not satisfied
    --> /Users/syphar/.cargo/registry/src/github.com-1ecc6299db9ec823/crates-index-diff-15.0.0/src/types.rs:125:26
     |
125  | #[derive(Default, Clone, serde::Serialize, serde::Deserialize, Eq, PartialEq, Debug)]
     |                          ^^^^^^^^^^^^^^^^ the trait `Serialize` is not implemented for `SmartString<LazyCompact>`
126  | pub struct CrateVersion {
127  |     /// The crate name, i.e. `clap`.
     |     -------------------------------- required by a bound introduced by this call
     |
     = help: the following other types implement trait `Serialize`:
               &'a T
               &'a mut T
               ()
               (T0, T1)
               (T0, T1, T2)
               (T0, T1, T2, T3)
               (T0, T1, T2, T3, T4)
               (T0, T1, T2, T3, T4, T5)
             and 131 others
note: required by a bound in `types::_::_serde::ser::SerializeStruct::serialize_field`
    --> /Users/syphar/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.147/src/ser/mod.rs:1899:12
     |
1899 |         T: Serialize;
     |            ^^^^^^^^^ required by this bound in `types::_::_serde::ser::SerializeStruct::serialize_field`

and some similar errors.

how to reproduce:

in an empty directory:

  • cargo init
  • cargo add crates-index-diff
  • cargo build

related links

Is it possible to find the date of a change?

Unless you go to poke the crates.io HTTP API, the only way to find the publication date of a particular version of a crate is to grovel through the index git repository. I'm going to need to do this eventually, but since this project already does a bit of groveling through the index git repository, would you be interested if I just contributed a PR to add this functionality?

Gracefully handle semantically meaningless changes

In rust-lang/crates.io#5066 there are plans for the official index to be normalized to match the crates.io database. This unfortunately creates a lot of churn in the index. By churn I mean sorting the entries by version number, sorting the features, sorting the deps, etc. None of this should be semantically meaningful, but may lead to weird bugs in this crate.

This issue is to make sure that this crate handles these changes gracefully, hopefully before the index normalization is actually performed.

Per-Commit stepping and baseline improvement

Various tasks offloaded from #26 with the main purpose of restoring the correct order of crate changes and updating the baseline tests to be more realistic.

Tasks

  1. add support for per-commit stepping to allow for orderly discovery of changes
  2. improve baseline to test big steps and one-commit-per-step (using the new API of above) like ---x---x---xxx
  3. adjust baseline test to inform about the yanked status as well, so we expect that all changes leave us as exactly the right information about the versions still in existence and their yanked status.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.