Comments (9)
Also, Partial no longer normalizes by default.
from stringdistances.jl.
Thanks. How did you end up on this? I am now restricting the last argument to be an Integer (26221a1). Does that solve your issue?
from stringdistances.jl.
So I ended up at this issue by trying to answer: "does short word w
appear in long string str
up to 2 in DL-distance?". Since Partial
invokes normalize, I first used e.g. Partial(DL)(w, str, 2 / length(w))
, but I ran into issues where that would not identify correct matches, and I assumed it was due to floating point issues:
julia> w = "abcdef"
"abcdef"
julia> str = "1234abcxyf1234"
"1234abcxyf1234"
julia> Partial(DL)(w, str, 2 / length(w))
1.0
Here, the answer I want is 1/3
, so that when I multiply by length(w)
I indeed get 2, the unnormalized DL-distance between w
and substrings of str
. So I fixed it via
julia> Partial(DL)(w, str, (2 / length(w)) + eps())
0.3333333333333333
I thought things were working fine until I ran into a test query that did not match, which I reduced to the issue here.
I think therefore restricting to integers will at least error instead of giving the wrong answer, but I'm not sure it solves the problem completely, since I don't quite know how to do get the right functionality. I think maybe I should just vendor a copy of Partial
and modify it to not normalize to avoid floating point issues altogether.
from stringdistances.jl.
Maybe I am using max_dist
incorrectly. Is this right?
julia> DL("abcdef", "abcxyf", 2)
3
julia> DL("abcdef", "abcxyf",3)
2
I was thinking they both should be 2.
from stringdistances.jl.
Right, it's a bug. Thanks for spotting it. I think I have solved it with 4df4bad.
Let me know if you encounter other issues.
from stringdistances.jl.
Thanks for the quick fixes! I'll try again tomorrow and see if I can spot any issues.
from stringdistances.jl.
I haven't been able to find any more issues, by the way. I'll post a new issue if I find any.
Mind registering the latest release?
from stringdistances.jl.
ok, done. Let me know if you encounter other bugs — this is very useful.
from stringdistances.jl.
Will do, thanks!
from stringdistances.jl.
Related Issues (20)
- Phonetic distance HOT 1
- Tag a new version HOT 1
- `Base.findmin(s1, s2, dist::Partial)`
- `compare` with `Partial` distances gives negative answers HOT 4
- DamerauLevenshtein() vs Levenshtein() why the same distance ? HOT 1
- Speeding up qgram distances with pre-counting of qgrams HOT 9
- (Partial) Hamming distance HOT 5
- TagBot trigger issue HOT 5
- Simpler QGramDistances implementation and prep for general dictionaries and iterators HOT 5
- `Partial` only looks at substrings of the same length... HOT 1
- pairwise not working with StringDistances HOT 3
- unexpected behavior when computing distance with an array HOT 2
- Non-strings HOT 4
- The value of "compare" is probably wrong. HOT 1
- Feature Request: Parallel processing HOT 4
- incremental compilation may be fatally broken for this module HOT 5
- Julia v1.7 Jaro() doesn't work HOT 2
- incomplete readme documentation HOT 1
- NaN (or ArgumentError) from QGram distances for short strings HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stringdistances.jl.