Code Monkey home page Code Monkey logo

Comments (8)

matthieugomez avatar matthieugomez commented on July 1, 2024 1

Inheritance from StringDistances is useful for me. Iā€™m not going to remove it because you need Levenshtein to be a true metric. Maybe ask the NearestNeighbour package to allow premetric ā€” they can always print a warning in this case.

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on July 1, 2024 1

I've removed the inheritance from StringDistance (it is just now just a union of distances). Levenshtein is now a Metric. Let's see how it goes ā€” I'll tag a new version in a few weeks if no one complains.

from stringdistances.jl.

oxinabox avatar oxinabox commented on July 1, 2024

The more minimal version of this is to just change Levenshtein

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on July 1, 2024

That's interesting. I'm not sure what to do. The issue is that Levenshtein cannot both inherit from StringDistances and Metric.
To be clear, Levenshtein is still a semi-metric, so the current behavior is not wrong per se, it's just not precise enough.

from stringdistances.jl.

matthieugomez avatar matthieugomez commented on July 1, 2024

Removing the abtract type StringDistance would not solve everything.

This package introduces distance modifiers Partial, Winkler etc which are StringDistances parametrized by other StringDistances (see an example here). It is possible (I have not checked) that Partial{Levenshtein} is a true metric whereas Partial{Jaro} is only a semimetric. But there is no way to express these properties through inheritance in Julia: the type Partial must either inherit from SemiMetric or fromMetric.

This suggests that properties of the metric should a trait rather than an abstract type. I'm curious what other people think (e.g. @KristofferC)

from stringdistances.jl.

KristofferC avatar KristofferC commented on July 1, 2024

The Distances.jl has some very strange stuff like https://github.com/JuliaStats/Distances.jl/blob/7f3a28c0d1372e3b3edbcbc28f00ba5645e1bbdb/src/metrics.jl#L107-L108 which makes it hard to understand how types are structured...

Regarding using a trait, it might make sense. The comment about "Partial{Levenshtein} is a true metric whereas Partial{Jaro} is only a semimetric" suggests it would be a good idea since subtyping cannot model that.

from stringdistances.jl.

oxinabox avatar oxinabox commented on July 1, 2024

Removing StringDistance would at least partially solve the issue.
Sinced even if we are left with Partial{Levenshtein} <: SemiMetric (which as you say is certainly not wrong),
its still means we can get Levenshtein<:Metric presice.

My gut is telling me that most of those parameterized distence modifies are not true Metrics
So this wold be a start until
A) a trait based system can be worked out
B) someone sits down and writes the actual proofs on which modifiers on which base metrics are result in which traits.

from stringdistances.jl.

oxinabox avatar oxinabox commented on July 1, 2024

thanks, think that is a good solution, and if it becomes unworkable then traits become next inline

from stringdistances.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.