Code Monkey home page Code Monkey logo

levenshtein-diff's People

Contributors

ajmalsiddiqui avatar nickcondron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

levenshtein-diff's Issues

[Feature Request] Weighted distances

Hi there!

Coming from using the RapidFuzz library in Python, a feature I liked to use was weighted distances. RapidFuzz Docs.

Do you think this would be possible to implement here without too much issue? I know parameters might pose a design problem as there are no optional arguments in Rust and having every function always take weights might be annoying to users. Maybe a separate function would be useful? Would you be open to contributions?

Thanks in advance for responding!

`generate_edits` API can be reworked to avoid `Result`

The generate_edits function currently only returns an Err if the matrix passed in is incorrect in some way. In my opinion, it makes much more sense to remove that parameter, call distance internally, and return only the Vec<Edit<T>>.

The downside of doing it this way is that the user can no longer select which version of the levenshtein functions they use. However, I don't see that as a major downside compared to simplifying the API.

I'm willing to write a PR for this if you agree with the idea.

Removing or swapping first item results in an invalid distance matrix.

Calculating the distance between two collections, where one collection is an exact copy of the other except for the presence of the first item, appears to result in an invalid distance matrix:

fn main() {
    let collection_1: Vec<String> = vec!["Hello".into(), "World".into()];
    let collection_2: Vec<String> = vec!["World".into()];

    let (distance, matrix) = levenshtein_diff::distance(&collection_1, &collection_2);
    let _edits = levenshtein_diff::generate_edits(&collection_1, &collection_2, &matrix);

    println!("levenshtein distance of {distance}");
}

Error log:

   Compiling test-crate v0.1.0 (C:\Users\<private>\test)
    Finished dev [unoptimized + debuginfo] target(s) in 0.50s
     Running `target\debug\test-crate.exe`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidDistanceMatrixError', src\main.rs:6:90
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\debug\test-crate.exe` (exit code: 101)

This also appears to occur when the first and second items are swapped:

fn main() {
    let collection_1: Vec<String> = vec!["Hello".into(), "World".into()];
    let collection_2: Vec<String> = vec!["World".into(), "Hello".into()];

    let (distance, matrix) = levenshtein_diff::distance(&collection_1, &collection_2);
    let _edits = levenshtein_diff::generate_edits(&collection_1, &collection_2, &matrix); // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidDistanceMatrixError', src\main.rs:6:90

    println!("levenshtein distance of {distance}");
}

Create benchmarks

The Levenshtein functions are often used in performance-critical code. I think it would be good to have benchmarks to advertise the library's performance and use to potentially improve performance. I have used Criterion and Iai to make benchmarks before, and I'd be willing to make a PR if you think it's a good idea to add this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.