Comments (8)
Inheritance from StringDistances is useful for me. Iām not going to remove it because you need Levenshtein to be a true metric. Maybe ask the NearestNeighbour package to allow premetric ā they can always print a warning in this case.
from stringdistances.jl.
I've removed the inheritance from StringDistance
(it is just now just a union of distances). Levenshtein
is now a Metric
. Let's see how it goes ā I'll tag a new version in a few weeks if no one complains.
from stringdistances.jl.
The more minimal version of this is to just change Levenshtein
from stringdistances.jl.
That's interesting. I'm not sure what to do. The issue is that Levenshtein
cannot both inherit from StringDistances
and Metric
.
To be clear, Levenshtein is still a semi-metric, so the current behavior is not wrong per se, it's just not precise enough.
from stringdistances.jl.
Removing the abtract type StringDistance
would not solve everything.
This package introduces distance modifiers Partial, Winkler
etc which are StringDistances
parametrized by other StringDistances
(see an example here). It is possible (I have not checked) that Partial{Levenshtein}
is a true metric whereas Partial{Jaro}
is only a semimetric. But there is no way to express these properties through inheritance in Julia: the type Partial
must either inherit from SemiMetric
or fromMetric
.
This suggests that properties of the metric should a trait rather than an abstract type. I'm curious what other people think (e.g. @KristofferC)
from stringdistances.jl.
The Distances.jl has some very strange stuff like https://github.com/JuliaStats/Distances.jl/blob/7f3a28c0d1372e3b3edbcbc28f00ba5645e1bbdb/src/metrics.jl#L107-L108 which makes it hard to understand how types are structured...
Regarding using a trait, it might make sense. The comment about "Partial{Levenshtein} is a true metric whereas Partial{Jaro} is only a semimetric" suggests it would be a good idea since subtyping cannot model that.
from stringdistances.jl.
Removing StringDistance
would at least partially solve the issue.
Sinced even if we are left with Partial{Levenshtein} <: SemiMetric
(which as you say is certainly not wrong),
its still means we can get Levenshtein<:Metric
presice.
My gut is telling me that most of those parameterized distence modifies are not true Metric
s
So this wold be a start until
A) a trait based system can be worked out
B) someone sits down and writes the actual proofs on which modifiers on which base metrics are result in which traits.
from stringdistances.jl.
thanks, think that is a good solution, and if it becomes unworkable then traits become next inline
from stringdistances.jl.
Related Issues (20)
- Phonetic distance HOT 1
- Tag a new version HOT 1
- `Base.findmin(s1, s2, dist::Partial)`
- bug in `DamerauLevenshtein` HOT 9
- `compare` with `Partial` distances gives negative answers HOT 4
- DamerauLevenshtein() vs Levenshtein() why the same distance ? HOT 1
- Speeding up qgram distances with pre-counting of qgrams HOT 9
- (Partial) Hamming distance HOT 5
- TagBot trigger issue HOT 5
- Simpler QGramDistances implementation and prep for general dictionaries and iterators HOT 5
- `Partial` only looks at substrings of the same length... HOT 1
- pairwise not working with StringDistances HOT 3
- unexpected behavior when computing distance with an array HOT 2
- Non-strings HOT 4
- The value of "compare" is probably wrong. HOT 1
- Feature Request: Parallel processing HOT 4
- incremental compilation may be fatally broken for this module HOT 5
- Julia v1.7 Jaro() doesn't work HOT 2
- incomplete readme documentation HOT 1
- NaN (or ArgumentError) from QGram distances for short strings HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stringdistances.jl.