Code Monkey home page Code Monkey logo

sliceslice-rs's Introduction

sliceslice

Actions Crate Docs License

A fast implementation of single-pattern substring search using SIMD acceleration, based on the work presented by Wojciech Muła. For a fast multi-pattern substring search algorithm, see instead the aho-corasick crate.

Example

use sliceslice::x86::DynamicAvx2Searcher;

fn main() {
    let searcher = unsafe { DynamicAvx2Searcher::new(b"ipsum".to_owned().into()) };

    assert!(unsafe {
        searcher.search_in(b"Lorem ipsum dolor sit amet, consectetur adipiscing elit")
    });

    assert!(!unsafe {
        searcher.search_in(b"foo bar baz qux quux quuz corge grault garply waldo fred")
    });
}

Benchmarks

We ran the i386 benchmarks on an HP EliteDesk 800 G2 Tower PC with an Intel Core i7-6700 Processor @ 3.40GHz, 16GB of RAM and 512GB of disk space, running Ubuntu 20.04.1 LTS, gcc 9.3.0 and Rust 1.46.0.

Library Version Function Short haystack Long haystack
std 1.46.0 String::find [335.32 ms 335.56 ms 335.83 ms] [344.62 ms 345.01 ms 345.52 ms]
memmem 0.1.1 TwoWaySearcher::search_in [87.927 ms 88.029 ms 88.151 ms] [401.40 ms 401.59 ms 401.81 ms]
twoway 0.2.1 find_bytes [274.60 ms 274.82 ms 275.07 ms] [146.32 ms 146.44 ms 146.58 ms]
sse4-strstr¹ 0.0.0-9308a59 avx2_strstr_v2 [75.389 ms 75.515 ms 75.682 ms] [38.521 ms 38.579 ms 38.649 ms]
sliceslice 0.2.0 DynamicAvx2Searcher::search_in [79.283 ms 79.416 ms 79.596 ms] [35.135 ms 35.181 ms 35.247 ms]

¹ sse4-strstr is not memory safe as it can read beyond the end of the haystack, please see the documentation for sliceslice where this is discussed in more detail.

Benchmarks results column chart

Licensing

Licensed under the MIT license. See the LICENSE file for details.

sliceslice-rs's People

Contributors

gpgabriel avatar marmeladema avatar zakcutner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sliceslice-rs's Issues

Return match position

sliceslice currently only returns a boolean to denote if the substring has been found or not.
Most crate are actually providing the match position.
It would be nice to offer a similar API, but without sacrificing performances.
This should help to ensure correctness (do we find the right needle at the right position?) but also improve the fairness of our benchmarks since we currently actually compare with crates that are returning strictly more information.

Performance regression with rust 1.52.0

Rust 1.51.0:

short_haystack/DynamicAvx2Searcher::search_in #2
                        time:   [1010073463 instructions 1010073466 instructions 1010073469 instructions]
long_haystack/DynamicAvx2Searcher::search_in #2
                        time:   [237648142 instructions 237648143 instructions 237648144 instructions]
random_haystack/DynamicAvx2Searcher::search_in #2
                        time:   [1563190 instructions 1563190 instructions 1563190 instructions]

Rust 1.52.0:

short_haystack/DynamicAvx2Searcher::search_in #2
                        time:   [1233359721 instructions 1233359723 instructions 1233359726 instructions]
long_haystack/DynamicAvx2Searcher::search_in #2
                        time:   [237719727 instructions 237719730 instructions 237719734 instructions]
random_haystack/DynamicAvx2Searcher::search_in #2
                        time:   [1567493 instructions 1567493 instructions 1567493 instructions]

Summary:

  • short: +22%
  • long: +0%
  • random: +0%

storing DynamicAvx2Searcher

Until version 0.2 I could store DynamicAvx2Searcher pretty easily:

enum Matcher {
    Regex(Regex),
    String((String, DynamicAvx2Searcher)),
}

How do I do this in 0.3? Apologies if this is a rust n00b question. I dynamically create many Matchers and store them in a Vec.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.