Code Monkey home page Code Monkey logo

Comments (9)

matthieugomez avatar matthieugomez commented on June 19, 2024 1

Also, Partial no longer normalizes by default.

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on June 19, 2024

Thanks. How did you end up on this? I am now restricting the last argument to be an Integer (26221a1). Does that solve your issue?

from stringdistances.jl.

ericphanson avatar ericphanson commented on June 19, 2024

So I ended up at this issue by trying to answer: "does short word w appear in long string str up to 2 in DL-distance?". Since Partial invokes normalize, I first used e.g. Partial(DL)(w, str, 2 / length(w)), but I ran into issues where that would not identify correct matches, and I assumed it was due to floating point issues:

julia> w = "abcdef"
"abcdef"

julia> str = "1234abcxyf1234"
"1234abcxyf1234"

julia> Partial(DL)(w, str, 2 / length(w))
1.0

Here, the answer I want is 1/3, so that when I multiply by length(w) I indeed get 2, the unnormalized DL-distance between w and substrings of str. So I fixed it via

julia> Partial(DL)(w, str, (2 / length(w)) + eps())
0.3333333333333333

I thought things were working fine until I ran into a test query that did not match, which I reduced to the issue here.

I think therefore restricting to integers will at least error instead of giving the wrong answer, but I'm not sure it solves the problem completely, since I don't quite know how to do get the right functionality. I think maybe I should just vendor a copy of Partial and modify it to not normalize to avoid floating point issues altogether.

from stringdistances.jl.

ericphanson avatar ericphanson commented on June 19, 2024

Maybe I am using max_dist incorrectly. Is this right?

julia> DL("abcdef", "abcxyf", 2)
3

julia> DL("abcdef", "abcxyf",3)
2

I was thinking they both should be 2.

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on June 19, 2024

Right, it's a bug. Thanks for spotting it. I think I have solved it with 4df4bad.
Let me know if you encounter other issues.

from stringdistances.jl.

ericphanson avatar ericphanson commented on June 19, 2024

Thanks for the quick fixes! I'll try again tomorrow and see if I can spot any issues.

from stringdistances.jl.

ericphanson avatar ericphanson commented on June 19, 2024

I haven't been able to find any more issues, by the way. I'll post a new issue if I find any.

Mind registering the latest release?

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on June 19, 2024

ok, done. Let me know if you encounter other bugs — this is very useful.

from stringdistances.jl.

ericphanson avatar ericphanson commented on June 19, 2024

Will do, thanks!

from stringdistances.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.