Code Monkey home page Code Monkey logo

harper's Introduction

harper's People

Contributors

dependabot[bot] avatar elijah-potter avatar lukasmwerner avatar szabgab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

harper's Issues

bug: + not recognized as a valid work

When using the + symbol for math equations inside a markdown file I get the following fixes given the following input:

T(n) = T(n-1) + 1

Fixes:

1. Replace with: "0"
2. Replace with: "CD"
3. Replace with: "0" 

feat: Revise spellcheck suggestion ordering

The current spellcheck suggestions are sorted based on the edit distance to the provided word. This works, but there are a few practical issues.

While there are a lot of possible ways to improve this, I first want to try simply prioritizing longer words. Most spelling errors seem to be omissions, rather than additions or replacements.

Sometime down the line, we can prioritize suggestions based on Google's 1-gram data.
This would not require including the 1-gram data in Harper, rather finding the frequencies of the words that are already in our list.

`[lspconfig] Cannot access configuration for harper_ls.`

I am trying to add harper_ls to Neovim, but I am getting this error whenever I start Neovim:

[lspconfig] Cannot access configuration for harper_ls. Ensure this server is listed in `server_configurations.md` or added as a custom server.

Here is the relevant section of my config (using lazy.nvim) (-- ... means omitted code):

-- lsp-zero.lua
return {
    "VonHeikemen/lsp-zero.nvim",
    -- ...
    config = function()
        -- ...
        local lsp_conf = require("lspconfig")
        -- ...
        lsp_conf.harper_ls.setup({
            settings = {
                ["harper-ls"] = {
                    linters = {
                        spell_check = true,
                        spelled_numbers = false,
                        an_a = true,
                        sentence_capitalization = true,
                        unclosed_quotes = true,
                        wrong_quotes = false,
                        long_sentences = true,
                        repeated_words = true,
                        spaces = true,
                        matcher = true
                    }
                }
            }
        })
        -- ...
    end
}

feat: identify emojis as separate token

This is very low priority. We would only want to specifically identify emojis if we were to create a lint around them... which doesn't sound incredibly useful.

feat: Should Parse Markdown Correctly

Currently, Harper only parses plain English properly.

This results in errors similar to the following, where the word, "adore" throws an error, even though it is spelled completely correctly.

I __adore__ this cupcake.

In order to parse markdown properly, we need to generate tokens in the same way we generate tokens for plain english, while ignoring the markdown additions.

I believe we can do this through the use of pulldown_cmark.

feat: detect missing spaces

If the document contains a large, misspelled "word," like nospaces, we should do a best guess at where we could insert a space to make it two valid words.

bug: checking go directives

When using Harper on Go codebases it would be nice to ignore checking Go comments that start with //go:* as these comments are directives for the compiler. The following is a example:

//go:embed templates/login.html
var LoginPageHTML string

The documentation on this Go feature is here

feat: Anaphora checking

Not sure what we can do here. There are some situations where repetition of the same word at the beginning of a sentence is a literary device (anaphora), and others where it comes off as word and wrong.

How can we notify the user when they do that latter?

feat: Implement parsing of Hunspell dictionaries

Overview

There are several open source dictionaries available in the hunspell *.dict and *.aff formats. Notably, there are a good many here.

Why?

Right now, the main problem with the spellchecker is the available word list.
The current one, english_words.txt, has too many words.
Not only that, but the word list also contains a lot of "words" that don't seem to be part of the standard English lexicon (e.g. "aarp").

By enabling Harper to use Hunspell dictionaries, we can lean on the existing curation.

The Formats

Source

*.dict File

The *.dict file is extremely similar in usage to our existing english_words.txt file.
The main difference is the addition of the / separated postfixes that provide additional information about each word.
These postfixes allow Hunspell to ship a relatively small word set, and expand it at runtime.

This file technically act as a drop-in replacement for the existing wordlist, but certain words will be marked as invalid, since we wouldn't be processing the postfixes.
For example, "there" would be marked as valid, but "there's" would not.

*.aff File

The affix file define how the postfixes described above should be expanded.
Right now, we do not intend to support the entire *.aff file format, just enough to fit our needs with a specific dictionary. For example, we will ignore the encoding setting and assume all dictionaries are UTF-8.
We will also (at least initially) not support compounding.

bug: Harper lints inline math

Right now Harper throws a hissy fit every time inline math is used. Inline math should be ignored by Harper.

I've been meaning to fix this issue for quite some time now. When inspecting Markdown, Harper currently uses pulldown_cmark, which doesn't presently support math.

However, they recently merged support into the branch for the upcoming release. Before we can include the changes, we need that version of pulldown_cmark to be on crates.io.

The merged request

bug: table cells are not treated as end of sentence

Maybe you can classify it as not a bug, but my understanding of a cell in markdown is that the sentence is implicitly ended whether or not there is a period.

Example: harper-ls will complain that the bottom half of this table is a sentence which is too long

| Key        | Action                                                                 |
| ---------- | ---------------------------------------------------------------------- |
| `j`        | Scroll down                                                            |
| `k`        | Scroll up                                                              |
| `l`        | Scroll one page down                                                   |
| `h`        | Scroll one page up                                                     |
| `r`        | Reload file                                                            |
| `f` or `/` | Search                                                                 |
| `n` or `N` | Jump to next or previous search result                                 |
| `s` or `S` | Enter select link mode. Different selection strategy.                  |
| `Enter`    | Select. Depending on which mode it can: open file, select link, search |
| `Esc`      | Go back to _normal_ mode                                               |
| `t`        | Go back to files                                                       |
| `b`        | Go back to previous file (file tree if no previous file)               |
| `g`        | Go to top of file                                                      |
| `G`        | Go to bottom of the file                                               |
| `d`        | Go down half a page                                                    |
| `u`        | Go up half a page                                                      |
| `q`        | Quit the application                                                   |

feat: `a vs "an"

As the title says. Harper should be able to check and provide suggestions to fix improper use of "a" vs "an" depending on the succeeding word.

"thread 'main' panicked at [...] when slicing

Hi! Thanks for this awesome project!

I got this panic in a Lua file when using Harper 0.6.2 as a language server:

[ERROR][2024-02-16 20:27:27] .../vim/lsp/rpc.lua:796	"rpc"	"/home/melker/.cargo/bin/harper-ls"	"stderr"	"thread 'main' panicked at /home/melker/.cargo/registry/src/index.crates.io-6f17d22bba15001f/harper-ls-0.6.2/src/tree_sitter_parser.rs:185:32:\nbegin <= end (23 <= 14) when slicing `-----------\n-- Noice --\n-----------\n`\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace\n"

Here's what the file looks like (stripped down, but still causing the error):

-----------
-- Noice --
-----------

add to mason registry

Seeing that nvim-lspconfig already has a config for harper, it should also be added to mason as a common method for installing packages in nvim.

bug: parsing error on words containing non English characters

If I try to write a name of something containing non English characters, harper-ls tries to spellcheck the substring after the special character. I don't expect it to recognise names from other languages since it's meant for English, but it hinders me from adding the whole word to the global/local dictionary using my editor's quick fix.

Example:
I try to write the word Løvetann. harper-ls asks if I ment to spell vetann this way, and the quick fix option is to add vetann to the dictionary.

Requested behaviour:
I would like harper-ls to mark the whole word, and give me the option to add Løvetann to the global/local dictionary.

License

Hi! This repo doesn't have a license, which means that it's techincally not FOSS. If your intention is for it to be FOSS, consider adding a license :)

bug: Contractions are marked as a single token

Harper currently marks contractions as a single token, rather than three.

For example: you'll should be marked as you ' ll. Similarly, ain't should be marked as ain ' t.
Where ' is a punctuation token.
This is with the intention of running a special spellchecking linter for contractions.
They should not be handled by the generalized spellchecking linter.

This is related to #6.

feat: "a" vs "an"

As the title says. Harper should be able to check and provide suggestions to fix improper use of "a" vs "an" depending on the succeeding word.

FR: custom dictionary path

Right now, the location of the custom dictionary path is static. It would be useful to name a custom location, e.g. for syncing purposes.

Build from crates.io fails

Looks to be a semver problem in the deps?
(doesnt occur when building from git)

   Compiling harper-ls v0.8.1
error[E0308]: `match` arms have incompatible types
  --> /home/jayvdb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/harper-ls-0.8.1/src/tree_sitter_parser.rs:30:24
   |
18 |           let language = match file_extension {
   |  ________________________-
19 | |             "rs" => tree_sitter_rust::language(),
20 | |             "tsx" => tree_sitter_typescript::language_tsx(),
21 | |             "ts" => tree_sitter_typescript::language_typescript(),
...  |
29 | |             "rb" => tree_sitter_ruby::language(),
   | |                     ---------------------------- this and all prior arms are found to be of type `tree_sitter::Language`
30 | |             "swift" => tree_sitter_swift::language(),
   | |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `tree_sitter::Language`, found a different `tree_sitter::Language`
...  |
34 | |             _ => return None
35 | |         };
   | |_________- `match` arms have incompatible types
   |
   = note: `tree_sitter::Language` and `tree_sitter::Language` have similar names, but are actually distinct types
note: `tree_sitter::Language` is defined in crate `tree_sitter`
  --> /home/jayvdb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tree-sitter-0.22.1/binding_rust/lib.rs:55:1
   |
55 | pub struct Language(*const ffi::TSLanguage);
   | ^^^^^^^^^^^^^^^^^^^
note: `tree_sitter::Language` is defined in crate `tree_sitter`
  --> /home/jayvdb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tree-sitter-0.20.10/binding_rust/lib.rs:43:1
   |
43 | pub struct Language(*const ffi::TSLanguage);
   | ^^^^^^^^^^^^^^^^^^^
   = note: perhaps two different versions of crate `tree_sitter` are being used?

bug: Sentence parser does not include final quotation mark

The current sentence implementation does not consider that quotation marks can appear after the sentence terminator. We should consider this final quotation as part of the sentence.

For example:

She said, "There is no way this is true."

feat: Include spaces after commas

Include a checker that ensures there is exactly one space after every comma. Quotes complicate things, since spaces should come after the quote.

Test cases:

hello world,my friend
"Hello,"my friend said.

bug: improve cross-case spellcheck.

As of right now, these words are marked incorrect (which is valid), but no suggestions are provided due to improper handling of capitalization.

ymca
Ymca

feat: Spell checker should expand search if no words are found

When the spell checker encounters especially long, incorrectly spelled words, it fails to provide any suggestions.

For example: algorithmically gives no suggestions. Algorithmically is now in the dictionary, but when it wasn't it was marked as incorrectly spelled.

This can be fixed by gradually expanding beyond the max_edit_dist until a word is found.

feat: Capitalize Common Proper Nouns

Harper should detect and repair uncapitalized proper nouns and brand names.

Examples:

youtube -> YouTube
youTube -> YouTube
Youtube -> YouTube
china -> China
united States -> United States
United states -> United States

feat: ignore shebangs

Harper should not lint shebangs found at the start of files.

Examples include:

#! /usr/bin/ruby
#! /bin/bash

feat: Detect repetition of common words

Harper should include a linting rule that detects and repairs incorrect repetition of common words.

Examples that should throw a lint:

She lifted the the rock.
I will will do it later.

Examples that should not throw a lint:

This is very very difficult.

While the above example could be grammatically improved by removing the repetition, it is not a grammatical error. This improvement should be a separate lint.

bug: Spell checker runs on number suffixes

The spell checker should not need to run on number suffixes.

For example:

Ideally, all of them will be completed before August 12th.

Currently, Harper flags the -th as an error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.