Code Monkey home page Code Monkey logo

quickenshtein's Introduction

Icon

Quickenshtein

A quick and memory efficient Levenshtein Distance calculator for .NET

AppVeyor Codecov NuGet

Performance

Quickenshtein gets its speed and memory efficiency from a number of different optimizations. To get the most performance out of the library, you will need .NET Core 3 or higher as this has support for hardware intrinsics.

Quickenshtein takes advantage of the following hardware intrinsics. On any recent x86 system, you will likely have these available.

If your computer doesn't have one of the hardware intrinsics available, Quickenshtein will still work - just slower than optimal.

Multi-Threading

By default, Quickenshtein performs in single-threaded mode as this mode performs best for small to medium size strings while having no memory allocations. When dealing with huge strings of 8000 characters or more, it may be useful to switch to multi-threaded mode. In this mode, the calculation is broken up and shared between multiple cores in a system.

Multi-threading is especially useful for systems without hardware intrinsics or for .NET Framework as shown in the table below where it provided a 3x performance improvement.

Method Runtime NumberOfCharacters Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein .NET 4.7.2 8000 110.686 ms 10.118 ms 0.554 ms - - - -
Quickenshtein_Threaded .NET 4.7.2 8000 36.601 ms 16.121 ms 0.883 ms - - - 1260 B

To enable threading, you can pass in CalculationOptions.DefaultWithThreading to Levenshtein.GetDistance() or configure your own CalculationOptions with settings that work best for you.

Note: Multi-threading is not allocation free (unlike single-threading mode) and will allocate a small amount depending on the number of threads used.

Benchmarking

There are a number of benchmarks in the repository that you can run on your system to see how well Quickenshtein performs.

Most of these benchmarks...

  • Run .NET Framework and .NET Core so you can see how the performance changes between them
  • Compare against a simple baseline Levenshtein Distance implementation with no specific optimizations
  • Compare against Fastenshtein, one of the other fast .NET Levenshtein Distance implementations

You can view results to these benchmarks at the links below:

Example Usage

using Quickenshtein;

// Common usage (uses default CalculationOptions with threading disabled)
var distance1 = Levenshtein.GetDistance("Saturday", "Sunday");

// Enable threading
var distance2 = Levenshtein.GetDistance("Saturday", "Sunday", CalculationOptions.DefaultWithThreading);

// Custom calculation options (helps with tuning for your specific workload and environment - you should benchmark your configurations on your system)
var distance3 = Levenshtein.GetDistance("Saturday", "Sunday", new CalculationOptions {
    EnableThreadingAfterXCharacters = 10000,
    MinimumCharactersPerThread = 25000
});

Learning Resources

I've written quite a bit about Levenshtein Distance and various ways you can extract performance from it:

If you prefer video:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.