helix-editor / nucleo Goto Github PK
View Code? Open in Web Editor NEWA fast and convenient fuzzy matcher library for rust
License: Mozilla Public License 2.0
A fast and convenient fuzzy matcher library for rust
License: Mozilla Public License 2.0
I am considering using nucleo
in gitui as it draws nice speed improvements but I would feel better if the crate would make it easier to figure out what changed from a release to another, two tings mainly helping with that:
CHANGELOG.md
As discussed recently in the Matrix room, it would be nice to add a feature flag to disable Unicode normalization. If the project using nucleo
has already some unicode normalization dependency in its tree, it's not necessary to add more with nucleo
.
Thanks!
Tried looking in readme and crates.io but didn't find it.
Want to use this crate in my project. :)
Thank you.
See jake-stewart/jfind#19 for context.
At least in the linked asciinema it is significantly slower, but I'm not sure for the exact reasons.
Startup time should be negligible for non-tiny datasets, so a fuzzer cli frontend would be easiest to use for fair comparison.
There currently only seem to be methods available for getting the best match.
Many usecases require ranking many/all items. Getting back a sorted list would be nice, or at least the ability to get a score for a single needle and a haystack so we can do the collecting and sorting ourselves.
Thanks for creating the fuzzy library.
I encounter a weird problem for Nucleo struct.
For the following code which you can run on rust-explorer
use std::sync::Arc;
use nucleo::Nucleo;
use nucleo::pattern::{CaseMatching, Normalization};
fn main() {
let mut matcher = init_fuzzy_matcher();
let inject = matcher.injector();
let list = ["foobar", "fxxoo", "oo", "a"];
list.iter().for_each(|s| {
inject.push(s, |_| {});
});
matcher
.pattern
.reparse(0, "f", CaseMatching::Ignore, Normalization::Smart, false);
let _status = matcher.tick(1000);
dbg!(matcher.pattern.column_pattern(0));
let mut counter = 0;
loop {
let _status = matcher.tick(100);
// if status.changed {
let snapshot = matcher.snapshot();
let total = snapshot.item_count();
let got = snapshot.matched_item_count();
let res: Vec<_> = snapshot
.matched_items(..)
.map(|item| item.data)
.collect();
dbg!(total, got, res);
// }
// if !status.running {
// break;
// }
println!("running");
if counter > 4 {
break;
}
counter += 1;
}
}
type Matcher = Nucleo<&'static str>;
fn init_fuzzy_matcher() -> Matcher {
Nucleo::new(
nucleo::Config::DEFAULT,
Arc::new(|| println!("notified")),
None,
1,
)
}
The res
is always empty:
[src/main.rs:34:9] total = 4
[src/main.rs:34:9] got = 0
[src/main.rs:34:9] res = []
By using nucleo::Matcher
, for the same config, input and needle string, there is the desired output.
use nucleo::pattern::{Atom, AtomKind, CaseMatching, Normalization};
use nucleo::Matcher;
fn main() {
let mut matcher = init_fuzzy_matcher();
let list = ["foobar", "fxxoo", "oo", "a"];
let res = Atom::new(
"f",
CaseMatching::Ignore,
Normalization::Smart,
AtomKind::Fuzzy,
false,
)
.match_list(&list, &mut matcher);
dbg!(res);
}
fn init_fuzzy_matcher() -> Matcher {
Matcher::new(nucleo::Config::DEFAULT)
}
[src/main.rs:20:5] res = [
(
"foobar",
36,
),
(
"fxxoo",
36,
),
]
So the question is how we use Nucleo in the right way? I see an issue asking for examples, but no replies in there.
I also scan the code in helix's source files, though nucleo is used as its dependency, the real use of it is Matcher, not Nucleo.
I am striving for a high test coverage in nucleo
. The matcher crate and the pattern parsing should already hit 80% test coverage. I would like to track coverage automatically in CI (for example with coveralls). This helps with triaging (identify uncovered branches) and makes it easier to track where tests are still needed.
I would imagine that we genrate test data with cargo tarpulin
in CI and upload the report to coveralls or a similar service (I would need to setup the account once there is a PR). LLVM based instrumentation should be used.
let needle = Utf32String::from("lying");
let haystack = Utf32String::from("Flibbertigibbet / イタズラっ子たち");
let mut matcher = Matcher::new(Config::DEFAULT);
assert_eq!(
matcher.substring_match(haystack.slice(..), needle.slice(..)),
None
)
This should pass, but it fails with a score of 30; running with indices
indicates that only the first codepoint in the haystack matches. If I get rid of the Japanese text the match goes away as expected. Fuzzy, postfix, and prefix match all indicate that there is no match; it's only substring match that breaks.
edit: If I use Utf32String::Unicode("lying".chars().collect())
there's no match, so I think the 'ascii needle, unicode haystack' codepath is the one with the problem.
I know you're still working on this. But I might as well report it.
The example below panics with 'should have been caught by prefilter', .../git/checkouts/nucleo-fe29e1ee969779b0/9c4b710/matcher/src/fuzzy_optimal.rs:41:13
Has something to do with case, because it doesn't happen when ignore_case is false.
[dependencies]
nucleo = {version ="*", git="https://github.com/helix-editor/nucleo"}
use nucleo::*;
fn main() {
let conf = MatcherConfig::DEFAULT;
let mut matcher = Matcher::new(conf);
let needle = "aB";
let mut buf1 = Vec::new();
let needle = Utf32Str::new(needle, &mut buf1);
let haystack = "aaB";
let mut buf2 = Vec::new();
let haystack = Utf32Str::new(haystack, &mut buf2);
let mut indices = Vec::new();
let result = matcher.fuzzy_indices(haystack, needle, &mut indices);
println!("{:?} {:?}", result, indices);
}
A simple starter example would be a great addition.
I am using typos-rs locally to automatically fix (some) typos. It would be nice to have this run in CI so its caught during review. I already have an ignore file setp so its just a matter of adding the GH action step
Hi! I saw your reddit post about nucleo and I got curious about writing a standalone cli version as a little "side project" (as "coding-breaks" beside learning for my exams). I've already started but I didn't create this issue at the beginning because I don't know how far I'll get or if it will turn into a mature cli program at all.
However, you said in this answer:
So somebody else could also contribute that (although if somebody does this, please reach out first).
So I'm writing this issue, just in case you may be interested in it and want to use some of my code if you start to write the standalone cli of nucleo
. Here's the link to my repo: https://github.com/TornaxO7/nucle
If you have some suggestions/hints/questions, feel free to ask.
For example I notice that a needle ë
fails to fuzzy match bë
. On the other hand a needle e
will match bë
, and a needle ë
will match a haystack ë
.
let paths = ["be", "bë"];
let mut matcher = Matcher::new(Config::DEFAULT);
let matches = Pattern::parse("ë", CaseMatching::Ignore).match_list(paths, &mut matcher);
assert_eq!(matches.len(), 1); // fails
Is that expected or a bug? If expected can you say a bit more about why and suggested workarounds... mostly just so I can document to people using my app why it works the way that it does.
Thank you.
First, thanks for posting this project!
I know you are trying to match fzf, but I'm finding some of the fzf scoring hard to make sense of. For example, consider these two cases:
If I search for "md" the second example scores the highest matching "me down". This is also fzf behavior, but it doesn't seem right to me. Would it make sense to incorporate the percent of matching indexes into the score calculation somehow?
I am creating an app like dmenu/rofi for windows using nucleo, emenu, and I have run into two main issues.
First, if there are multiple match candidates that are the same string, it would be nice to have a method on the snapshot to return the items along with the global index so I can differentiate between them in the gui, or just the indices and get the item with get_item()
.
Second, like with fzf-matcher, have some method to get the indices of the matched characters to highlight them in the gui.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.