Code Monkey home page Code Monkey logo

ka_ge.spell's Introduction

ქართული ორთოგრაფიული ლექსიკონი - Georgian Spell Checking Dictionary

Contains:

  • Hunspell dictionary (MIT License) (for OpenOffice.org, Mozilla Firefox, Google Chrome, and may more) in dictionaries/
  • Script to generate the dictionary from word lists (GPL3)

Note: The used word lists have been automatically created by crawling the internet using different techniques, so many words may be missing or wrong (which leads to false positives/negatives)

Dictionary installation

In Applications

  • OpenOffice and LibreOffice: An extension package it available here The dictionary extension can be installed from the extension manager: From the menu tools select Extension Manager. In the extension manager window click add and open ka_GE.spell_.oxt
  • Firefox, Thunderbird: Install the plugin from AMO
  • Other open source applications: Contact the developers and ask them to include this dictionary (see #3).

System wide

Mac OS X (10.6 and later)

  • Download or clone this repo.
  • In Finder, select Go To Folder from Go menu, type in ~/Library, click Go (for system wide installation, use /Library instead).
  • In the Library directory, locate the folder Spelling or create it, if it is not there.
  • Copy ka_GE.aff and ka_GE.dic from the repository (in dictionaries/) to the Spelling folder.

Linux

Copy dictionaries/ka_GE.dic and dictionaries/ka_GE.aff to /usr/share/hunspell/

Data sources

Word lists by the following People / from the following sources are used to generate the dictionary:

Thanks a lot for your awesome work!

Update/build dictionary

You need a bash compatible shell, gnu tools, hunspell (and hunspell-tools on some systems) and a c++14 compatible compiler installed. xmunch (https://github.com/gamag/xmunch) is as submodule, so after cloning this repository, run git submodule update --init, then go to xmunch subdirectory and run make.

To build the dictionary, run make all

To build the packages for firefox and OpenOffice, run make bundle later.

Updating Bumbeishvilis word list

NOTE: the word list is included in words/, you don't need this steps to work on the dictionary.

You need a running mysql server and git.

Clone this repository

Log into mysql and add a user and a database for the word list:

$ mysql -uroot -p
mysql > CREATE DATABASE geoword;
mysql > CREATE USER 'geowords'@'localhost' IDENTIFIED BY 'password';
mysql > CREATE USER 'geowords'@'localhost' IDENTIFIED BY 'password';
mysql > GRANT ALL PRIVILEGES ON geowords.\* TO 'geowords'@'localhost'; 
mysql > FLUSH PRIVILEGES;

Create a file called dbaccess in the ka_GE.spell root. containing:

DBNAME=geowords
DBUSER=geowords
DBPASS=password

call make db

Remarks

The automatically created dictionary is not very accurate, some words may be wrong, many missing. To improve that, words from the dictionary can be reviewed and correct words added to the reviewed dictionary in their final, affix-compressed form. Wrong words can be added to blacklist.

Contributing

Any help is very welcome, especially reviewing the dictionary and improving the affix files.

TODO: translate README to Georgian.

ka_ge.spell's People

Contributors

gamag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ka_ge.spell's Issues

გამარჯობათ

ლიბრეში 5.3.0.3 სპელჩეკი არ მუშაობს? ანუ სწორ ვარიანტებს არ მთავაზობს და ვერ დამეხმარებით

Add to Software Using Hunspell

To make this more usable, we should add the dictionary to software that supports hunspell:

Feel free to contact developers and ask them to include the dictionary in their software. Point them here or to the OpenOffice extension liked above.

I'm happy to extend make bundle to package plugins for other software - just tell me how you need it.

Consider adding more word lists

Having different sources for the word list might allow us to improve the quality of the dict by removing words that only appear in one of them and therefore might be wrong. This requires however, that the word lists used are really created from different texts.

The following word lists could be analyzed. If they are really from "disjoint" sources and found to improve the quality of our dictionary, we could include them in the default build scripts (if there are no licensing problems).

Different dictionaries

გამარჯობა გაბრიელ მარგიანი,

I was going to finish my beta project of Georgian dictionary on holidays this month, but you have already finished, so I'll discard my project.

The difference of our dictionaries is the number of words.

Your dictionary has more than 84682 words, but mine, extracted from Google Keyboard, has almost 100,000 words.

You can take my beta dictionary and orthography dictionary: https://www.dropbox.com/sh/g2153cqqfjf2w7l/AADCoTJ8DVgSqwtL71E5oDdda?dl=0

When you download them, I'll close my project, so you have already finished.

I'll contact the developer of Dictionaries.io and ask him to replace my dictionaries for yours.

გილოხავ, გაბრიელ!

Gustavo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.