Code Monkey home page Code Monkey logo

fm's Introduction

fm

fm is a simple non-backtracking fuzzy text matcher useful for matching multi-line patterns and text. At its most basic the wildcard operator ... default) can be used in the following ways:

  • If a line consists solely of ... it means "match zero or more lines of text".
  • If a line starts with ..., the search is not anchored to the start of the line.
  • If a line ends with ..., the search is not anchored to the end of the line.

Note that ... can appear both at the start and end of a line and if a line consists of ...... (i.e. starts and ends with the wildcard with nothing inbetween), it will match exactly one line. If the wildcard operator appears in any other locations, it is matched literally. Wildcard matching does not backtrack, so if a line consists solely of ... then the next matching line anchors the remainder of the search.

The following examples show fm in action using its defaults:

use fm::FMatcher;

assert!(FMatcher::new("a").unwrap().matches("a").is_ok());
assert!(FMatcher::new(" a ").unwrap().matches("a").is_ok());
assert!(FMatcher::new("a").unwrap().matches("b").is_err());
assert!(FMatcher::new("a\n...\nb").unwrap().matches("a\na\nb").is_ok());
assert!(FMatcher::new("a\n...\nb").unwrap().matches("a\na\nb\nb").is_err());

When a match fails, the matcher returns an error indicating the line number at which the match failed. The error can be formatted for human comprehension using the provided Display implementation.

If you want to use non-default options, you will first need to use FMBuilder before having access to an FMatcher. For example, to use "name matching" (where you specify that the same chunk of text must appear at multiple points in the text, but without specifying exactly what the chunk must contain) you can set options as follows:

use {fm::FMBuilder, regex::Regex};

let ptn_re = Regex::new(r"\$.+?\b").unwrap();
let text_re = Regex::new(r".+?\b").unwrap();
let matcher = FMBuilder::new("$1 $1")
                        .unwrap()
                        .name_matcher(ptn_re, text_re)
                        .build()
                        .unwrap();
assert!(matcher.matches("a a").is_ok());
assert!(matcher.matches("a b").is_err());

fm's People

Contributors

bors[bot] avatar ltratt avatar vext01 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

ltratt vext01

fm's Issues

Should different names be allowed to match the same content?

Here's an in interesting one.

assert!(helper("$1 $1 $1", "a a a"));
assert!(helper("$1 $2 $3", "a a a"));

Using the same name matchers as in the repo.

I expected the latter to fail because I wasn't expecting different names to be allowed to assume the same value.

Now a real world example. Here's what a TIR test looks like:

assert_tir(
    "live($a)\n\
    $a = call(add6, [1u64, 1u64, 1u64, 1u64, 1u64, 1u64])\n\
    dead($a)",
    &tir_trace,
);

assert_tir() just stringifies the TIR trace and tries to match the given pattern using the following name matchers:

let ptn_re = Regex::new(r"\$.+?\b").unwrap();
let text_re = Regex::new(r"\$?.+?\b").unwrap(); // Optional `$` prefix.

Then to test a failing match, I naively flipped $a on the first line of the pattern to $c and was surprised to see it matched. Of course, it's ok for both $a and $c to match the same content.

But should it be? Perhaps I want to know that the variable used in live is distinct from the one in the call and the dead.

Incorrect error line?

I added the test:

assert_eq!(helper("...\nline2\nline3\nline4\n", "line1\nline2\nline3\nbadgers\nline5"), 4);

Which gives fails, as the failure line reported is 3, not as I expected 4. I expected 4 because the first line in the text which doesn't match is badgers on line 4.

Did I misunderstand something, or is that a bug?

Potential bug in name matcher?

I've added the following case to the name matcher test:

 assert!(helper("$1", "$2")); 

The regexs in the helper remain the same:

let nameptn_re = Regex::new(r"\$.+?\b").unwrap();
let name_re = Regex::new(r".+?\b").unwrap();

The FM matcher fails to match this.

I was expecting this to match, because the following assertions are true:

    assert!(Regex::new(r"\$.+?\b").unwrap().is_match("$1"));
    assert!(Regex::new(r".+?\b").unwrap().is_match("$2"));

Playground

I'm not sure if my expectations are wrong, or if it's a bug.

Two consecutive wildcards crash FM.

@vext01 and I triggered a bug in FM. If your pattern has two consecutive wildcards with a newline in between FM crashes:

thread 'tests::defaults' panicked at 'begin <= end (3 <= 0) when slicing `...`', src/lib.rs:357:28

See test:

diff --git a/src/lib.rs b/src/lib.rs
index c4fbca4..1c83515 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -523,6 +523,7 @@ mod tests {
             FMatcher::new(ptn).unwrap().matches(text).is_ok()
         }
         assert!(helper("", ""));
+        assert!(helper("...\n...", ""));
         assert!(helper("\n", ""));
         assert!(helper("", "\n"));
         assert!(helper("a", "a"));

I guess it doesn't make sense to do this in a pattern, but maybe a user friendly message would help here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.