Code Monkey home page Code Monkey logo

Comments (4)

robertfeldt avatar robertfeldt commented on July 18, 2024

Note that it goes even deeper since all but QGram returns NaN if both inputs are shorter than q:

julia> filter(d -> isnan(d(2)("", "")), [QGram, Cosine, Jaccard, Overlap, SorensenDice, MorisitaOverlap, NMD])
6-element Vector{DataType}:
 Cosine
 Jaccard
 Overlap
 SorensenDice
 MorisitaOverlap
 NMD

julia> QGram(1)("", "")
0

julia> QGram(2)("a", "b")
0

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on July 18, 2024

I am not sure there is an issue with the current implementation. The way I think about it is that there is a formula for each Qgram distance (given in the docs), which is valid even when the set of qgrams is empty. In some distances, the length of qgrams appears in the denominator, which is why the distance returns NaN when the set of qgrams is empty.

from stringdistances.jl.

robertfeldt avatar robertfeldt commented on July 18, 2024

Mathematically I agree.

I see dangers in actual use but ok, people will have to handle it themselves. I guess a simple solution might be to just highlight somewhere in the documentation that one can add a safe evaluate method like so:

julia> function safeevaluate(D::Union{Cosine, Overlap, MorisitaOverlap}, s1, s2)
           length(s1) >= D.q && length(s2) >= D.q && return(evaluate(D, s1, s2))
           throw(ArgumentError("An argument is shorter than q ($(D.q)): \"$s1\", \"$s2\""))
       end
safeevaluate (generic function with 1 method)

julia> D = Cosine(2)
Cosine(2)

julia> @assert safeevaluate(D, "aa", "bb") == evaluate(D, "aa", "bb")

julia> safeevaluate(D, "", "bb")
ERROR: ArgumentError: An argument is shorter than q (2): "", "bb"
Stacktrace:
 [1] safeevaluate(D::Cosine, s1::String, s2::String)
   @ Main ./REPL[2]:3
 [2] top-level scope
   @ REPL[5]:1

julia> function safeevaluate(D::Union{Jaccard, SorensenDice, NMD}, s1, s2)
           (length(s1) >= D.q || length(s2) >= D.q) && return(evaluate(D, s1, s2))
           throw(ArgumentError("An argument is shorter than q ($(D.q)): \"$s1\", \"$s2\""))
       end
safeevaluate (generic function with 2 methods)

julia> safeevaluate(Jaccard(2), "", "")
ERROR: ArgumentError: An argument is shorter than q (2): "", ""
Stacktrace:
 [1] safeevaluate(D::Jaccard, s1::String, s2::String)
   @ Main ./REPL[11]:3
 [2] top-level scope
   @ REPL[12]:1

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on July 18, 2024

Sure. I'm also open to change things — following what other libraries typically do in this case. I will leave this issue open.

from stringdistances.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.