wooorm / dictionaries Goto Github PK

View Code? Open in Web Editor NEW

1.2K 26.0 398.0 105.23 MB

Hunspell dictionaries in UTF-8

License: MIT License

JavaScript 62.68% Shell 37.32%

hunspell dictionaries spellchecker spellcheck spell check myspell grammar spell-check

dictionaries's Issues

Missing LICENSE file for eu_ES

Seems this one is missing.

Add type definitions

This seems trivial, but because a lot is generated there are multiple ways to do it.

My proposal:

Manually write dictionaries/script/template/index.d.ts
Copy it to all dictionary packages
Commit all copies
Add index.d.ts to the list of requiredFiles in test.js.

dictionary-fr words format

Hello,
I got some trouble with the french dictionary, most of invariable words are not correctly checked.
I openned the dic file and find lines like :

voici	89

I did not found any documentation about this syntax ?! In dictionnary-en there is no digits after words ?!

Html Entities found in `.aff` files.

While looking at the Hungarian dictionary, I found HTML Entities in the .aff file.

REP Angström &Aring;ngström

Some of them are not real entities:

dictionaries/dictionaries/hu/index.aff

Line 101 in 5ee9325

    
           WORDCHARS -.&permil;&sect;%&deg;0123456789&ndash;&euro;'&apos;&ffi;&ffl;&ff;&fi;&fl;

I was not able to find any references to Hunspell supporting HTML entities in .aff files.

By the way, thank you for maintaining these dictionaries.

[update word] Polish "przypuszczać - przypuszczający"

"przypuszczać - przypuszczający"
https://sjp.pwn.pl/szukaj/przypuszczaj%C4%85cy.html

New Czech dictionary

There is a new dictionary for Czech language available, with a lot of corrections made. More about it is written in a blogpost (in czech), the zip file can be directly downloaded here.

Could you include this new dictionary into the project?

el-polyton no space in 'έψ εύσ'.

I have been attempting to convert these dictionaries to qtwebengine format using qt's qwebengine_convert_dict tool.

I was unable to convert the file el-polyton/index.bdic due to the following error:

Did not find a space in 'έψ εύσ'.

Most other dictionaries did build using the tool so it leads me to believe that the fault may be with el-polyton/index.aff but as someone who is unfamiliar with hunspell, I am unable to tell if it needs a space instead of the tab.
If so, it seems more useful to report it here than just fix it at my end.

de_DE: Add greater list of words

Could you extend the German dictionaries with those in in here: de_dicts.zip? They are all in HunSpell format, but I have never created nor modified these dictionaries.

Note that the archive contains three subfolders:

1901: so called Old Rules;
1996: so called 1996 Reform`;
2006: so called 2006 Reform`.

Currently, the de/index.dic contains 75,767 words only, whereas the 2006/de_DE.dic contains 163,202 words.

Unable to use dictionary-es with Electron Application

I'm getting an error when I try to load the Spanish dictionary inside of an electron application:

import dictEs from 'dictionary-es'
import nspell from "nspell"
dictEs(ondictionary)
function ondictionary(err, dict) {
            if (err) {
                console.log(err);
                throw err
            }
            var spell = nspell(dict);
        }

Throws this error:

Uncaught Error: ENOENT, renderer\index.aff not found in C:\Users\LID-Mobile\Development\cccreator-desktop\node_modules\electron\dist\resources\electron.asar
    at notFoundError (ELECTRON_ASAR.js:108)
    at fs.readFile (ELECTRON_ASAR.js:536)
    at one (index.js?e606:15)
    at load (index.js?e606:11)
    at Object.initDictionary (globalFunc.js?ff72:529)
    at Store.updateLanguage (store.js?c0d6:204)
    at wrappedMutationHandler (vuex.esm.js?2f62:714)
    at commitIterator (vuex.esm.js?2f62:382)
    at Array.forEach (<anonymous>)
    at eval (vuex.esm.js?2f62:381)

I don't get the same error when loading the
dictionary-en-us or dictionary-en-ca dictionaries.

Thoughts?

Swedish dictionary not working

Hi,
Thanks for this easy to use awesome library!
I've been using English dictionary without issues but Swedish dictionary at dictionary-sv does not detect any issues but sees any input as correct.
I looked at index.js and it looks same as the one for English so I cannot detect the problemhere and appreciate any help.

There should be no Eszett in Swiss German

The Swiss German (de-CH) dictionaries should not include Eszett (ß) characters. In Swiss German this special character is replaced by ss as described here.

Specify FLAG UTF-8 when converting to UTF-8, if there was no explicit FLAG option

Hunspell read the affix file byte by byte and decodes UTF-8 on demand. If it's not instructed to do so for flags, it doesn't. So non-ASCII characters like "ý" are treated like several characters, and due to another bug Hunspell silently takes just the first character and ignores the rest. So the words can have unexpected flags.

Example: pt contains FORBIDDENWORD ý, and the perfectly valid word trabalhar/akYMjLÀÚ is treated as having this flag and thus considered misspelled.

British spelling of 'onwards'

According to https://dictionary.cambridge.org/dictionary/english/onwards, 'onwards' is more common in British English, however:

dictionaries/dictionaries/en-GB/index.dic

Line 34383 in 4099de3

onward

And actually it's got 'afterwards':

dictionaries/dictionaries/en-GB/index.dic

Line 11251 in 4099de3

afterwards

...so I guess it's better to make them consistent?

ko: Word does not match

I have been attempting to convert these dictionaries to qtwebengine format using qt's qwebengine_convert_dict tool.

I was unable to convert the file ko/index.bdic due to the following error:

Word does not match! - Index: 14081 - Expected: 김수한무거북이와두루미삼천갑자동방삭치치카포사리사리센타워리워리세브리캉무드셀라구름위허리케인에담벼락서생원에고양이고양이는바둑이바둑이는돌돌이 - Actual: 김수한무거북이와두루미삼천갑자동방� - ERROR converting, the dictionary does not check out OK.

Most other dictionaries did build using the tool so it leads me to believe that the fault may be with ko/index.bdic but as someone who is unfamiliar with hunspell, I am unable to tell if the Expected should be used to replace the actual.
If so, it seems more useful to report it here than just fix it at my end.

Not clear about the sources

I think this may be a lot of redundant work. I'm just not clear about the sources, this is a large collection of dictionaries. I was looking for something like this. This project seems to source from https://extensions.openoffice.org/en/project/polish-dictionary-pack. But, those were last edited in 08'.

LibreOffice replaces those dictionaries. It's another large repository of hunspell. Makes more sense to just that -- that's what I'm doing for my project. It seems like you have similar aims. Would make more sense just to nuke this and pull in from it.

Maybe there is an advantage, be glad to know it.

Issue in Dictionary file - Different style apostrophes, it marks it as a misspelling

We are having apostrophes difference for browser and word insert.

The top picture was the way the document was opened. You can see all these properly spelled words are marked incorrect because the apostrophe, but once I delete the apostrophe and type it back in, they are correct. " ’ " vs " ' "
Word insert:

Browser insert:

We have attached the affix
en_US.zip
file

Hungarian dictionary has wrong character encoding

The Hungarian dictionary, despite having UTF-8 encoding, doesn't contain the proper hungarian characters such as ü,ű,á,í,ö,ó,ő, etc.

Example:

Ăźzenet/1 1

instead of

Üzenet/1 1

I've tried to figure out, that maybe my computer encodes it wrong, but after trying to re-encode with Notepad++, and even setting encoding manually to utf-8 in Chrome, the issue still persists. Thus in it's current form, this dictionary is unusable by any spellchecker, because the special Latin-2 characters are all wrong.

hu: Word does not match

I have been attempting to convert these dictionaries to qtwebengine format using qt's qwebengine_convert_dict tool.

I was unable to convert the file hu/index.bdic due to the following error:

Word does not match! - Index: 35768 - Expected: gĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłl - Actual: gĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂłĂl - ERROR converting, the dictionary does not check out OK.

Most other dictionaries did build using the tool so it leads me to believe that the fault may be with hu/index.bdic but as someone who is unfamiliar with hunspell, I am unable to tell if it should be the Expected string.
If so, it seems more useful to report it here than just fix it at my end.

Simple code is not running

I'm trying this package but the console message is never shown

const nspell = require('nspell')
const dictionaryPt = require('dictionary-pt')

function testSpell(txt){
    dictionaryPt((error, pt) => {
        if (error) throw error
        var spell = nspell(pt)
        console.log(spell.suggest(txt))
      })
}

testSpell("Maquina")

dictionary-pt: "^3.1.0",
nspell: "^2.1.5"

I checked the programs stucks at var spell = nspell(pt) line. Any idea what's wrong?

arabic

Aloha there,
Can you add arabic dictionary please ?
Thanks in advance

Lithuanian dictionary is wrong

Nice work, thank you for maintaining this repo. Lithuanian dictionary seems to be non UTF-8 encoded and gives errors. For example "Tęsti Žaidimą" which means "Continue Game" when checking each word gives errors for every word.

[Question] Is there some index accross dictionaries ?

I want to build a very simple translator using dictionaries, my concerned languages are English, French and Arabic.
Imagine the following dictionaries:

French
[bourgeois, brunette, contraire, ]

English
[bourgeois, brunette, contrary, ]

If there is an index between the meaning of terms than I can map words easily.

Thanks a lot !

Add index.dic and index.aff to exports

Can you add theese files to exports property in package.json? We use direct import of theese files to use it in browser. But with latest versions webpack fails with error:

Module not found: Error: Package path ./index.dic is not exported from package \node_modules\dictionary-en (see exports field in \node_modules\dictionary-en\package.json)

Suggestion is not working for Swedish

I have used Syncfusion spellchecker with the Humspell using Asp .net core. but I tried to get suggestions from swediish I could not receive any result, as well as spell-checking function, is not working properly. but while not using the suggestion it is working properly and both the suggestion and checking functions are working with other languages like English variants, Russian, French etc.. so is there any limitations in the Swedish dictionary ???

Convert to ESM

Because 5f2b26a introduced breaking changes, I think now would be the right time to convert to ESM.

Is this compatible with nspell module?

en_GB doesn't work for "misspelt"

The en_GB doesn't work for the word "misspelt", but it does work for "misspelled" (which is wrong according to Grammarly).

Wrong license for Ukrainian dictionary

The Ukrainian dictionary license situation seems to be a bit confusing.
It states that dictionary files themselves are Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, and only the scripts building them are GPL3 .

I'll be the first one to say that I'm not an expert in OS licensing, but if I'm right, then a change of license in this project should be in order.

How does hunspell works

I looked for a paper which explains how does hunspell works, but without any success. I would like to know how does hunspell works and especially how does it make suggestions ? Does it uses Levenshtein distance to look for the best suggestion ?

Force accents in Russian dictionary (ё)

Do we have a way to use 'ё' instead of 'е' for ru where needed? E.g. 'актер', which is currently in the .dic file, should not be valid in accent-sensitive (case-sensitive) mode. It should be 'актёр'. I see that Portuguese has this issue figured out.

dictionnary-fr issue

Hi there !

Looks like the suggest/correct methods doesnt handle some words in french, like "préavis"

in index.dic we can find : préavis po:nom is:mas is:inv

but console.log(spell.correct("preavis")); returns false.

Any idea how to fix this ?

Add Swahili dictionaries

Hello there, and thanks for the work!

Would it be possible to add also Swahili dictionaries?

LibreOffice provides Hunspell dictionaries for Kenyan Swahili and Tanzanian Swahili:

https://extensions.libreoffice.org/en/extensions/show/swahili-dictionary

There are more places that provide these dictionaries, but I suppose they all have the same content:

https://addons.mozilla.org/en-US/firefox/addon/kiswahili-spell-checker/

https://cgit.freedesktop.org/libreoffice/dictionaries/tree/sw_TZ

https://github.com/elastic/hunspell/tree/master/dicts/sw

Best regards

Mixed up .dic & .aff files

Hi, it looks like some of the dictionaries have had the dic and aff files mixed up. Looking through crawl.sh the generate method calls do look incorrect. Was wondering if this is on purpose or a typo. The affected dictionaries are...

bulgarian
czech
galician
greek
russian
turkish
ukrainian
vietnamese

sk: Improve the Slovak dictionary

I have tried to create a dictionary that would contain all (most) Slovak words including all their forms, but I have failed as I have never done it yet.

Slovak Academy of Sciences (Slovenska akademia vied, SAV) has worked on a morphology analyser untill approx 2015, which contains 100 MB data. Each word has a flag what part of speech the word is and what grammer case the word is in.

Some links (all in Slovak; if needed, I can translate them for you into English):

latest version of the morphology analyser: ma-2015-02-05.txt.xz;
MA information;
MA database;
explanations.

I could help you with the translation of the Slovak texts, Slovak grammer and testing.

Use a newer, updated source for Russian

The current source for Russian dictionaries was updated long time ago and seems to be unmaintained anymore.

Arch Linux AUR ships a newer, updated version from the Libre Office extension.

It would be great to use those dictionaries here too.

Investigate shipping `.dic`, `.aff` as JSON

Problem:

Uses Node-specific APIs, so doesn’t work in browsers, Deno
Ships a weird callback load function, would be nicer to use the module system to do that

Solution:

Either as files with an import assertion. Weird shipping a whole string as a single thing in a .json file ("1mb of data") and also as the normal file (1mb of data)?

export {default as aff} from './index.aff.json' assert {type: 'json'}
export {default as dic} from './index.dic.json' assert {type: 'json'}
export const dictionary = {aff, dic}
export {dictionary as default}

Potentially even inline? Needs a flag currently in Node, or also an import assertion https://twitter.com/wooorm/status/1513958664884920323. URLs might be limited so dictionaries won’t fit. Might be slow. Might be fast?

export {default as aff} from 'data:application/json,"..."' assert {type: 'json'}
export {default as dic} from 'data:application/json,"..."' assert {type: 'json'}
export const dictionary = {aff, dic}
export {dictionary as default}

However, that means:

Breaks CJS (they could still read the files themselves tho)
Import assertions are very new, so won’t work in a bunch of places

Origin of Russian dictionaries?

If you don't mind me asking, where do the Russian dictionaries come from? Who created them and licensed them as LGPL-3.0?

Issues in en-GB dictionary

We are facing issue in “en-GB” dictionary, there we couldn’t find the word “ability” in aff file. So, the issue occurs all the words related to the word “ability” in suffix. So, can you please provide the definition for that or any other alternative solutions.

Find words

Can I use this library to generate a list of word of certain type - e.g. get all nouns, then all verbs, then all conjuctions, then all adjectives, ...?
Or is its only pupose the spell checking (presented in the example inside the readme)?

Korean returns true for all words (correct function)

For some reason the Korean dictionary always return true when using with nspell.

var dictionary = require('dictionary-ko');
var nspell = require('nspell');
dictionary(ondictionary);

function ondictionary(err, dict) {
    if (err) {
      throw err
    }
  
    var spell = nspell(dict);
  
    console.log(spell.correct('hello'));
  }

Add words to German dictionary

Hey 👋🏻 As you suggested, we should continue our conversation here on GitHub.

Disclaimer: I have almost no clue how Hunspell works, so please forgive if it is a thumb question.

I'm using ReSpeller together with ReSharper in Visual Studio. The German phrase Zahlung gelöscht gets marked as misspelled, so I thought about adding these words to the German Hunspell dictionary.

In your README I found that the source for the German dictionary is j3e. On his page there is a little online spell checker and when entering my sentence Zahlung gelöscht, the result is:

Spellcheck result:

no errors found

And now I'm confused: within the German dictionary I don't find the word Zahlung, but I do find gelöscht:

So am I encountering a ReSpeller bug regarding gelöscht?
And what about Zahlung?
And why does the online spell checker not complain at all?

I'd really appreciate your feedback!

Using the dictionary-hu throwing error.

Code:
var dictionary = require('dictionary-hu')
var nspell = require('nspell')
dictionary(ondictionary)
function ondictionary(err, dict) {
if (err) {
throw err
}
var spell = nspell(dict)
console.log(spell.correct('Szerelem'))
}

Error:
C:\Program Files\nodejs\node.exe .\test.js
Process exited with code 3221225477

Not able to use dictionary-en-us

When I run the following code

fs.readFileSync(path.join(base, 'index.dic'), 'utf-8');
fs.readFileSync(path.join(base, 'index.aff'), 'utf-8');

I get the the following Error, Please help

fs.js:646
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^

Error: ENOTDIR: not a directory, open '/home/deeven/Documents/eloquent javascript/work/node_modules/dictionary-en-us/index.js/index.dic'
    at Object.fs.openSync (fs.js:646:18)
    at Object.fs.readFileSync (fs.js:551:33)
    at Object.<anonymous> (/home/deeven/Documents/eloquent javascript/work/nspell.js:7:4)
    at Module._compile (module.js:653:30)
    at Object.Module._extensions..js (module.js:664:10)
    at Module.load (module.js:566:32)
    at tryModuleLoad (module.js:506:12)
    at Function.Module._load (module.js:498:3)
    at Function.Module.runMain (module.js:694:10)
    at startup (bootstrap_node.js:204:16)

wooorm / dictionaries Goto Github PK

dictionaries's Issues

Recommend Projects

Recommend Topics

Recommend Org