Code Monkey home page Code Monkey logo

wikokit's Introduction

Language is a city to the building of which every human being brought a stone.

Ralph Waldo Emerson

Wikokit - Machine-readable Wiktionary

Stone I. Parser wikokit. This program parses Wiktionaries, constructs and fills machine-readable Wiktionaries.

Stone II. PHP API (piwidict project) to work with machine-readable Wiktionary.

Stone III. Dictionary kiwidict. A visual interface to the parsed English Wiktionary and Russian Wiktionary databases.

The goal of this project is to extract semi-structured information from Wiktionary and construct machine-readable dictionary (database + API + GUI).

Download new Wiktionary parsed databases from this page.

Stone III: Dictionary kiwidict - Android applications

  • kiwidict offline multilingual dictionary and thesaurus based on the English Wiktionary.
  • kiwidict-ru offline multilingual dictionary and thesaurus based on the Russian Wiktionary.
  • magnetowordik word game based on data extracted from the English Wiktionary.

Graphical user interface (kiwidict and kiwidict-ru) supports (see release_notes.txt):

  • words filtering by language code (e.g. de, fr)
  • wildcard characters: the percent sign (%) matches zero or more characters, and underscore (_) a single character;
  • todo: list of words only with meanings and / or semantic relations (use checkboxes).

After installation you can find the parsed Wiktionary database in SQLite format on your phone in the folder SD card/kiwidict/.

Stone I: Parser and dictionary description

I) The maximum goal (in distant future) is to extract all information (i.e. all sections of entry) from all wiktionaries and convert data to machine-readable format.

II) Today's result. Now machine-readable Wiktionary contains the following information extracted from Russian Wiktionary and English Wiktionary:

  1. word's language and part of speech;
  2. meanings / definitions;
  3. semantic relations;
  4. translations;
  5. (^) context labels (from definitions);
  6. (^) quotations (text + bibliographic data).

(^) Context labels and quotations were extracted only from Russian Wiktionary.

Machine-readable Wiktionary framework: Machine-readable Wiktionary framework

I am interested that all two hundred Wiktionaries were parsed by this parser. But I know only Russian and English :)

If you are developer and if you are interested in adding modules to parse "your Wiktionary", then

Statistics

The machine-readable dictionary database statistics:

Project structure

Wiki tool kit (wikokit) contains several projects related to wiki

./common_wiki — common (low-level) functions to handle data of Wikipedia and Wiktionary in MySQL database,

./common_wiki_jdbc — functions to handle data of Wiktionary in MySQL and SQLite databases (JDBC, Java SE) (depends on common_wiki.jar).

./android/common_wiki_alink — Eclipse copy (source link) of ./common_wiki (!NetBeans)

./android/common_wiki_android — functions for access to Wiktionary in Android SQLite version of database (depends on common_wiki.jar).

./android/magnetowordik — Android word game (Wiktionary thesaurus).

./hits_wiki — API for access to Wikipedia in MySQL database, algorithms to search synonyms in Wikipedia (depends on jcfd.jar, common_wiki.jar).

./TGWikiBrowser — visual browser to search for synonyms in local or remote Wikipedia (depends on hits_wiki.jar and common_wiki.jar)

./wikidf — Wiki Index Database (list of lemmas and links to wiki pages, which contain these lemmas).

./wikt_parser — Wiktionary parser creates a MySQL database (like WordNet) from an Wiktionary MySQL dump file. The project goal is to convert Wiktionary articles to machine-readable format. (It depends on common_wiki, common_wiki_jdbc)

./wiwordik — Visualization of parsed Wiktionary database. wiki + word = wiwordik.

The code of previous project Synarcher are used in wikokit.

Further reading

In English

In Russian

See also

License

This program is multi-licensed and may be used under the terms of any of the following licenses:

See documentation.

wikokit's People

Contributors

componavt avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.