Code Monkey home page Code Monkey logo

nucleo's People

Contributors

a-kenji avatar blinxen avatar gabydd avatar hywan avatar noib3 avatar pascalkuthe avatar poliorcetics avatar the-mikedavis avatar tudyx avatar zub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nucleo's Issues

consider using release tags and a changelog

I am considering using nucleo in gitui as it draws nice speed improvements but I would feel better if the crate would make it easier to figure out what changed from a release to another, two tings mainly helping with that:

  • tag releases in git
  • maintain a CHANGELOG.md

[Feature request] Way to get scores of many/all items

There currently only seem to be methods available for getting the best match.

Many usecases require ranking many/all items. Getting back a sorted list would be nice, or at least the ability to get a score for a single needle and a haystack so we can do the collecting and sorting ourselves.

How should Nucleo work?

Thanks for creating the fuzzy library.

I encounter a weird problem for Nucleo struct.

For the following code which you can run on rust-explorer

use std::sync::Arc;
use nucleo::Nucleo;
use nucleo::pattern::{CaseMatching, Normalization};

fn main() {
    let mut matcher = init_fuzzy_matcher();
    let inject = matcher.injector();
    let list = ["foobar", "fxxoo", "oo", "a"];
    list.iter().for_each(|s| {
        inject.push(s, |_| {});
    });
    matcher
        .pattern
        .reparse(0, "f", CaseMatching::Ignore, Normalization::Smart, false);
    let _status = matcher.tick(1000);
    dbg!(matcher.pattern.column_pattern(0));

    let mut counter = 0;
    loop {
        let _status = matcher.tick(100);
        // if status.changed {
        let snapshot = matcher.snapshot();
        let total = snapshot.item_count();
        let got = snapshot.matched_item_count();
        let res: Vec<_> = snapshot
            .matched_items(..)
            .map(|item| item.data)
            .collect();
        dbg!(total, got, res);
        // }
        // if !status.running {
        //     break;
        // }
        println!("running");
        if counter > 4 {
            break;
        }
        counter += 1;
    }
}

type Matcher = Nucleo<&'static str>;

fn init_fuzzy_matcher() -> Matcher {
    Nucleo::new(
        nucleo::Config::DEFAULT,
        Arc::new(|| println!("notified")),
        None,
        1,
    )
}

The res is always empty:

[src/main.rs:34:9] total = 4
[src/main.rs:34:9] got = 0
[src/main.rs:34:9] res = []

By using nucleo::Matcher, for the same config, input and needle string, there is the desired output.

use nucleo::pattern::{Atom, AtomKind, CaseMatching, Normalization};
use nucleo::Matcher;

fn main() {
    let mut matcher = init_fuzzy_matcher();
    let list = ["foobar", "fxxoo", "oo", "a"];
    let res = Atom::new(
        "f",
        CaseMatching::Ignore,
        Normalization::Smart,
        AtomKind::Fuzzy,
        false,
    )
    .match_list(&list, &mut matcher);
    dbg!(res);
}

fn init_fuzzy_matcher() -> Matcher {
    Matcher::new(nucleo::Config::DEFAULT)
}
[src/main.rs:20:5] res = [
    (
        "foobar",
        36,
    ),
    (
        "fxxoo",
        36,
    ),
]

So the question is how we use Nucleo in the right way? I see an issue asking for examples, but no replies in there.
I also scan the code in helix's source files, though nucleo is used as its dependency, the real use of it is Matcher, not Nucleo.

Generate Coverage Report in CI

I am striving for a high test coverage in nucleo. The matcher crate and the pattern parsing should already hit 80% test coverage. I would like to track coverage automatically in CI (for example with coveralls). This helps with triaging (identify uncovered branches) and makes it easier to track where tests are still needed.

I would imagine that we genrate test data with cargo tarpulin in CI and upload the report to coveralls or a similar service (I would need to setup the account once there is a PR). LLVM based instrumentation should be used.

Spurious matches with substring matching and non-ASCII

        let needle = Utf32String::from("lying");
        let haystack = Utf32String::from("Flibbertigibbet / イタズラっ子たち");
        let mut matcher = Matcher::new(Config::DEFAULT);
        assert_eq!(
            matcher.substring_match(haystack.slice(..), needle.slice(..)),
            None
        )

This should pass, but it fails with a score of 30; running with indices indicates that only the first codepoint in the haystack matches. If I get rid of the Japanese text the match goes away as expected. Fuzzy, postfix, and prefix match all indicate that there is no match; it's only substring match that breaks.

edit: If I use Utf32String::Unicode("lying".chars().collect()) there's no match, so I think the 'ascii needle, unicode haystack' codepath is the one with the problem.

Panic with simple pattern.

I know you're still working on this. But I might as well report it.
The example below panics with 'should have been caught by prefilter', .../git/checkouts/nucleo-fe29e1ee969779b0/9c4b710/matcher/src/fuzzy_optimal.rs:41:13

Has something to do with case, because it doesn't happen when ignore_case is false.

[dependencies]
nucleo = {version ="*", git="https://github.com/helix-editor/nucleo"}
use nucleo::*;

fn main() {
    let conf = MatcherConfig::DEFAULT;
    let mut matcher = Matcher::new(conf);

    let needle = "aB";
    let mut buf1 = Vec::new();
    let needle = Utf32Str::new(needle, &mut buf1);

    let haystack = "aaB";
    let mut buf2 = Vec::new();
    let haystack = Utf32Str::new(haystack, &mut buf2);

    let mut indices = Vec::new();
    let result = matcher.fuzzy_indices(haystack, needle, &mut indices);

    println!("{:?} {:?}", result, indices);
}

Run typos-rs in CI

I am using typos-rs locally to automatically fix (some) typos. It would be nice to have this run in CI so its caught during review. I already have an ignore file setp so its just a matter of adding the GH action step

Standalone CLI - toy project

Hi! I saw your reddit post about nucleo and I got curious about writing a standalone cli version as a little "side project" (as "coding-breaks" beside learning for my exams). I've already started but I didn't create this issue at the beginning because I don't know how far I'll get or if it will turn into a mature cli program at all.

However, you said in this answer:

So somebody else could also contribute that (although if somebody does this, please reach out first).

So I'm writing this issue, just in case you may be interested in it and want to use some of my code if you start to write the standalone cli of nucleo. Here's the link to my repo: https://github.com/TornaxO7/nucle

If you have some suggestions/hints/questions, feel free to ask.

How should/does nucleo handle umlauts?

For example I notice that a needle ë fails to fuzzy match . On the other hand a needle e will match , and a needle ë will match a haystack ë.

let paths = ["be", "bë"];
let mut matcher = Matcher::new(Config::DEFAULT);
let matches = Pattern::parse("ë", CaseMatching::Ignore).match_list(paths, &mut matcher);
assert_eq!(matches.len(), 1); // fails

Is that expected or a bug? If expected can you say a bit more about why and suggested workarounds... mostly just so I can document to people using my app why it works the way that it does.

Thank you.

Higher score for shorter matches?

First, thanks for posting this project!

I know you are trying to match fzf, but I'm finding some of the fzf scoring hard to make sense of. For example, consider these two cases:

  1. Moby Dick
  2. Though I cannot tell why it was exactly that those stage managers, the Fates, put me down for this shabby part of a whaling voyage

If I search for "md" the second example scores the highest matching "me down". This is also fzf behavior, but it doesn't seem right to me. Would it make sense to incorporate the percent of matching indexes into the score calculation somehow?

[Feature request] Get match indices and matched letters indices

I am creating an app like dmenu/rofi for windows using nucleo, emenu, and I have run into two main issues.

First, if there are multiple match candidates that are the same string, it would be nice to have a method on the snapshot to return the items along with the global index so I can differentiate between them in the gui, or just the indices and get the item with get_item().

Second, like with fzf-matcher, have some method to get the indices of the matched characters to highlight them in the gui.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.