Code Monkey home page Code Monkey logo

quickenshtein's Introduction

Hello. I'm Turnerj

aka. James Turner - a programmer and entrepreneur with a love of cars, music and technology

Projects

Cache Tower MongoFramework Infinity Crawler Quickenshtein DDNS4Me Sitemap Tools Robots Exclusion Tools DinoDNS Build Versioning Vibrancy Aqueduct Borderless1942 Repo2Image LevenshteinBenchmarks NetworkedVegas2

As a Core Contributor

Schema.NET

Support My OSS Work

I haven't scratched the surface of the projects I want to build or work on. Supporting me allows not only the work for these projects to continue, it also helps me to develop other new and interesting projects.

quickenshtein's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar mgoodfellow avatar turnerj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

quickenshtein's Issues

Benchmarking individual pieces of Quickenshtein

As a thought coming from my own tweet about how SSE2 being enabled didn't seem faster than no intrinsics leads me to believe that I need to better measure individual optimizations as well as everything combined.

That is to say, if I improve FillRow, I should have a FillRow benchmark. As these are private methods, I'd probably have them marked as internal and use that configuration thing to allow certain libraries access to it.

Initially, it would be for benchmarking but might even be a good idea at some really targeted testing of these components.

Re-look at Threading

As long as I can have it allocation-free (or if it must have allocations, make threading optional), there are a few things I could do to make it work.

  1. Use the ThreadPool rather than Parallel.For - assuming that it is faster & allocates less.
  2. With the thread boundaries, have arrays spanning the rows allocated by ArrayPool
  3. Use the presence of a non-zero value in a "previous" boundary to know whether it can start

Having a full array spanning the rows at each boundary prevents any weirdness with having to stall the next thread across manually. All threads (except the first thread) just need to check if the left boundary has completed the row it is up to.

I'd still probably have it on a while (true) loop checking values simply because all locking attempts I've seen are either "too slow" (I don't want to wait milliseconds), allocate and/or require disposal.

I'd probably pass pointer references to both to the strings and their boundaries (so up to 4 pointers). Besides that, each thread would only need to know how many rows (total), how many columns (individual) and their individual start column index.

If I understand everything right, I should be able to use SIMD instructions on top of threading which effectively will allow all my optimizations at once.

Usage in .NetStandard2.0 library can cause MethodNotFound exception

Hi,

If you include this package into a .NetStandard2.0 library, and invoke the method signatures of

Levenshtein.GetDistance(string, string) or Levenshtein.GetDistance(string, string, CalculationOptions)

Then call this from a test project (for example), where the project is .net5 / .netcore etc, then when invoking the method, you will get a MissingMethodException

This happens because the methods are only present in netstandard

This can be fixed by using string extensions .AsSpan() to cause the other form included in all outputs to be used instead.

Unless I'm missing something, why are these methods only included if this is a .netstandard library?

Branch Miss Analysis

Similar to how damageboy does in their blog post for sorting, it would be useful to track branch misses etc with the same tool - perf, a program on Linux.

This is how damageboy called it:

$ perf record -F max -e instructions,branches,branch-misses \
    ./Example --type-list DoublePumpOverlined \
              --size-list 100000 --max-loops 1000 --no-check

Does perf work well in WSL? What exactly do I need the example program to do to be compatible with it?

This could be a useful utility to make sure we are optimised with our branching logic and whether there are any gains by optimising this further.

Full-branchless Calculation with SIMD

I've tried 3 times so far to have a fully branchless calculation by leveraging comparisons through SIMD. Unfortunately every attempt I've done has been slower than having the branch. This is likely in part that a branchless option actually was doing more operations (the "min" calls) and I'm still not actually taking advantage of the size of the vector (I'm really just operating on the first item).

First Version: https://gist.github.com/Turnerj/92ef8b18437042c9f15a6b7e932058b7
This version had the clever idea of calculating the substitution costs and deletion costs a chunk at a time. It actually made use of the whole vector by doing this and thus, should have been less operations overall. While I don't have the benchmark handy anymore, I believe it performed worse than Fastenshtein.

Second Version: https://gist.github.com/Turnerj/76b69b10acefa5d9dde99f362b7f13f8
Faster than the first version, this version does a few things differently. Taking advantage of MinHorizontal can allow just a single call to find the lowest value. I don't have the docs in front of me now but I believe I was limited to ushort and while 65K characters is a big string, I'd really like to support 100K+ string sizes. Anyway, this version was about the same speed as Fastenshtein as far as I can recall.

Third Version: https://gist.github.com/Turnerj/c3f1962825f28f2e7798f85d25c3a137
It is much faster than the first or second version but it still is slower than the current master. It is smarter with loading the source characters and isn't limited to ushort like the second attempt.

If I'm going to try a fourth attempt, to make it easier for myself, I really should break the benchmarking up like I've mentioned in #5 as writing all of that in an unrolled codebase was a PITA.

Improving the Diagonal Calculation

A few points:

  • Diagonal calculation isn't effective with its use of caching vector-sized strings, specifically the target vector which needs to be flipped. It flips it multiple times because of how the data iterates. If the diagonal calculation could be both diagonal within a column the width of the vector, it would eliminate this.
  • Threading doesn't make use of the diagonal calculation because the boundaries cannot get the correct data easily (both the back boundary for the next thread and the forward boundary for the current thread). Using the opposite of the technique above, instead of diagonal within a column the width of the vector, do diagonal in the row the width of the vector.

Preliminary calculations on performance for single threaded improvement (take these with a grain of salt, they all don't calculate correctly):

No target lookups and no shuffles/permutations aka. Process Column (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein 23.67 ms 0.125 ms 0.117 ms - - - -

No source lookups aka. Process Row (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein 24.50 ms 0.143 ms 0.134 ms - - - 40 B

No source or target lookups (INVALID)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein 21.20 ms 0.138 ms 0.130 ms - - - 40 B

For comparison, it takes about 26.8 ms to calculate it currently with the regular diagonal calculation. This means at best, single threaded could have a 12% improvement however it is likely to be closer to 8% for the "correct" solution.

No idea about how the threading code would perform though.

Improving Threading Calculation Performance

For example, use the Concurrency Visualizer extension to identify if there are any areas that can be improved.

This is more a curiousity than anything - my own testing so far has seen great gains in multi-threaded workloads however there could still be more room for improvement.

If possible, there likely are big gains by supporting diagonal calculation in a multi-threaded scenario however that would likely require a major re-write.

Add k-threshold support

This puts an upper-bounds on how high the calculation should actually calculate - once this value is hit, the code would stop processing the edit distance. This is useful for determining if a string was within a range of similarity (from equal to k).

Identify "slowdown" with one of the two strings being empty

This "slowdown" is very small - like 10 to 13 nanoseconds however it is more the big difference between it and Fastenshtein for the same fundamental logic.

Fastenshtein

  • Blank source and target (2.2 ns)
  • 10 character source and blank target (2.2 ns)
  • Blank source and 10 character target (15.8 ns) (This path doesn't apply shortcuts)

Quickenshtein

  • Blank source and target (13.6 ns)
  • 10 character source and blank target (13.6 ns)
  • Blank source and 10 character target (13.6 ns)

In each one for Quickenshtein, I'm only doing one or two if-statements and returning a result. My theory is it might be a code-size/CPU cache issue.

The reason to even identify this isn't so much about saving 11 nanoseconds on the same operation, it is understand why there would be a difference in the first place.

feature: edit script

hello,

is there a chance you consider making a way to obtain an edit script (actual insertions/deletions/substitutions) for the shortest distance? (actual insertions/deletions/substitutions)

thanks,

On .NET framework 4.8 this library is slower than super simple 1 file equivalent. Is that normal?

Hi, I've been running a benchmark and found that it's slower than single file <100 lines Fastenshtein algo

  • Is that because I'm NOT on the .NET CORE framework?
  • You algo, doesn't allocate memory, that's awesome, but still slower !¿?
  • My test compare rather small strings < 20 characters.

Other implementation
https://github.com/DanHarltey/Fastenshtein/blob/master/src/Fastenshtein/Levenshtein.cs

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19043.2130/21H1/May2021Update)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
[Host] : .NET Framework 4.8 (4.8.4515.0), X86 LegacyJIT
DefaultJob : .NET Framework 4.8 (4.8.4515.0), X86 LegacyJIT

Method N Mean Error StdDev Gen0 Allocated
Fastenshtein 100 598.5 ns 1.67 ns 1.40 ns 0.3443 573 B
Quickenshtein 100 901.5 ns 1.26 ns 1.05 ns - -

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.