Comments (4)
Thanks for your reply, and thanks for the package.
from symspellpy.
I believe this is because I missed updating replaced_words
when a combination of 2 terms is the best match. I have pushed a fix to this branch. Could you please test and see if that fixes the problem for you?
I have tried it on my side with following code
import pkg_resources
from symspellpy import SymSpell
sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
dictionary_path = pkg_resources.resource_filename(
"symspellpy", "frequency_dictionary_en_82_765.txt"
)
bigram_path = pkg_resources.resource_filename(
"symspellpy", "frequency_bigramdictionary_en_243_342.txt"
)
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)
sym_spell.load_bigram_dictionary(bigram_path, term_index=0, count_index=2)
input_term = (
"whereis th elove GPS hehad dated forImuch of thepast who "
"couqdn'tread in sixtgrade and 16 microstru cture him"
)
suggestions = sym_spell.lookup_compound(
input_term, max_edit_distance=1, ignore_non_words=True
)
for suggestion in suggestions:
print(suggestion)
for k, v in sym_spell.replaced_words.items():
print(f"origin: {k}, modify: {v.term}, edit_distance: {v.distance}")
and managed to get the following output
where is the love GPS he had dated for much of the past who couldn't read in six grade and 16 microstructure him, 9, 0
<omitted>
origin: microstru, modify: microstructure, edit_distance: 1
and it seems to address the issue
from symspellpy.
Thanks. It works. Could I add one more question? Is there a way to get the start, end index of the origin word?
from symspellpy.
Unfortunately there's no way to do that in symspellpy right now, you'll have to implement some custom post processing functions in your project for that
from symspellpy.
Related Issues (20)
- ignore_term_with_digits doesn't work HOT 2
- Correction doesn't prioritize bigram. HOT 3
- Predicts garbage for Bengali input HOT 7
- wrong word segmentation result HOT 1
- First line of the text file reads wrong HOT 4
- edit distance issue HOT 1
- Substring search
- error if i use spell checker to my dataset HOT 4
- How to empty the dictionary quickly HOT 3
- Custom Edit Distance HOT 2
- Using a custom dictionary with the desired correction HOT 1
- the frequency in the loaded dictionary is absolute, not relative
- Incompatible architecture on macOS
- Keep spacing in between words and not split on white space HOT 1
- is_acronym only detects numbers if match_any_term_with_digits HOT 1
- Correction not using bi-grams
- Bigrams omitted from save and load pickle methods HOT 5
- Does SymSpell has spell checker ? HOT 2
- Error when pip install on MacOS HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from symspellpy.