Comments (6)
These are very good points!
I'd like to address a couple of questions here.
* Should the `quirks` be applied recursively so essentially for `"'2x'"` `checker` would simplify it to `2`.
Technically that sounds like a good idea but requires additional work (using a mut Deque
feeding in generated suggestions and processing the Deque
element by element) so for now, I'd say get the recursion free version done, and then use the existing patterns in step 2 to refactor the suggestion processing logic.
* Should sub-parts of words be simplified as well? Example `2x-something` so it would check basically `2` and `something`.
I think the 2x quirk should be expanded a bit to something like ^[0-9]+(?:[,.e][0-9]+)?(?:-.+)?$
for the particular token, so here we would just expand the notion of 2x-pattern
.
As I see there's 2 types of rules here basically ones which produce
&str -> &str
and&str -> Vec<&str>
.
I understand it like this: The fn tokenize
splits up the chunks into words, which are then checked against the dictionary, we then check those tokens against the dictionary, if that yields a suggestion/detects a mistake, then we call something like fn quirks(..) -> Vec<Suggestion<_>>
which can internally handle all quirks described earlier (non-recursive for now) and will return n
-suggestions. Returning suggestions here has the advantage, that not much context needs to be fed into the fn, and it can do more complex things rather than just reduction.
What do you think?
from cargo-spellcheck.
This should be quite self contained within checker/hunspell.rs
, main.rs
and config.rs
.
CC @laysauchoa
from cargo-spellcheck.
Hey @drahnr I've tried it out and as always didn't succeed 😞
I'd like to address a couple of questions here.
- Should the
quirks
be applied recursively so essentially for"'2x'"
checker
would simplify it to2
. - Should sub-parts of words be simplified as well? Example
2x-something
so it would check basically2
andsomething
.
As I see there's 2 types of rules here basically ones which produce &str -> &str
and &str -> Vec<&str>
.
from cargo-spellcheck.
@zhiburt take a look at #90 - it implements the first step (more aligned to your proposal), repeated matching should be impl'd as step 2
from cargo-spellcheck.
0.4.0-alpha.1
just hit the road, it includes a hunspell
specific backend quirk: regex_transform: [ "re1", ... ]
specifies a bunch of regex options which are attempt to be applied to individual words to remove i.e. enclosing '
- the capture groups are then checked against the dictionary.
Note that this only solves half of the issues, i.e. the dashed suggestions for concatenated words can not be resolved using.
Example: testcase
in a text would be suggested to be test-case
, we would like an option to avoid those kind of meaningless suggestions.
from cargo-spellcheck.
Not entirely closed.
from cargo-spellcheck.
Related Issues (20)
- Spurious "Unexpected item made it into the items" warnings HOT 4
- Word "C++" is tokenized incorrectly and can not be whitelisted HOT 3
- Handle mdbook `book.toml`
- reflow sub command transposes `//` and leading space HOT 7
- Panic hunspell returned non-utf8 sequence HOT 7
- hunspell stumbles over copyright symbol HOT 5
- Broader CI/Docker usage support HOT 6
- How to ignore files HOT 7
- Build failure due to clang HOT 6
- How to add words? HOT 2
- Is there way to skip over doc links in comments? HOT 2
- Interactive fixing seems buggy HOT 3
- Support cargo-binstall HOT 9
- Reduce release artefact download size HOT 2
- Spellcheck doesn't list files when edition is inherited HOT 3
- Adding a footnote gives false positives HOT 2
- Is there a way to ignore commented out code? HOT 2
- Help identifying panic: `assertion failed: plain_range.start <= plain_range.end` HOT 15
- Non determinism with hunspell backend
- Installing the crate in a macbook machine HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cargo-spellcheck.