Code Monkey home page Code Monkey logo

toponym's Introduction

Build Status Coverage

Toponym

Build grammatical cases for words in Slavic languages from pre-defined recipes.

documentation: https://toponym.iwpnd.pw/

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Installing

for usage:

pip install toponym

for development:

git clone https://github.com/iwpnd/toponym.git
pip install flit
flit install toponym --symlink

Description

Problem

In Slavic languages a word can change, depending on how and where it is used within a sentence. The city Moscow (Москва) changes to Москве when used prepositional. So when you want to eg. know if:

"Москва" in "В Москве с начала года отремонтировали 3 тысячи подъездов"

>> False

Solution

This is where Toponym comes in. Utilizing pre-defined recipes it naively creates grammatical cases depending on the ending of the input word that the user wants to create Toponyms from. The recipe looks as follows:

Recipe

recipe = {
    "а": { # ending of the input-word
        "nominative": [[""], 0],
        "genitive": [ # case that we need
            ["ы","и"], # ending of the output-word
            1 # chars to be deleted, before ending of output is added
            ],
        "dative": [["е"], 1],
        "accusative": [["у"], 1],
        "instrumental": [...]
}

If multiple endings are given, multiple toponyms with that ending will be created. Some of those created toponyms do not make sense, or are not used in the wild. If you have an idea about how to remove those that are unreal please contact me.

With the built toponyms for you can now check:

from toponym.recipes import Recipes
from toponym.toponym import Toponym

recipes_russian = Recipes()
recipes_russian.load_from_language(language='russian')

city = "Москва"

t = Toponym(input_word=city, recipes=recipes_russian)
t.build()

print(t.list_toponyms())
>> ['Москвой', 'Москвы', 'Москви', 'Москве', 'Москву', 'Москва']

any([word in "В Москве с начала года отремонтировали 3 тысячи подъездов" for word in tn.list_toponyms()])
>> True

supported languages:

full name		iso code
croatian		hr
russian		    ru
ukrainian		uk
romanian		ro
latvian		    lv
hungarian		hu
greek		    el
polish		    pl

Running the tests

pytest toponym/tests/

toponym's People

Contributors

iwpnd avatar

Watchers

 avatar  avatar  avatar

toponym's Issues

Loss of some words

Loss of some words in the output if there are no endings for these words:

image

IndexError for "print_available_languages()" function

Please check on your side a proper work for this function. When I trying to get a list of available languages I receiving the index error: list index out of range

seems that dictionary in "settings.py" is correct on my local side:

image

or I should to check "TOPODICT_DIR" also ?

max() arg is an empty sequence

from toponym import toponym, topodict

td = topodict.Topodict(language='russian')
td.load()

word = "Коми"

tn = toponym.Toponym(word, td)
tn.build()

print(tn.topo)`

Results in:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-957119cce124> in <module>
      5 
      6 tn = toponym.Toponym(word, td)
----> 7 tn.build()
      8 
      9 print(tn.topo)

c:\br\pypy\topoynm_new\toponym-master\toponym\toponym.py in build(self)
     37         else:
     38             self.recipe = self.topodict[
---> 39                 self._get_longest_word_ending(self.word)
     40             ]
     41 

c:\br\pypy\topoynm_new\toponym-master\toponym\toponym.py in _get_longest_word_ending(self, word)
     57             x for x in possible_endings if x in self.topodict._dict.keys()]
     58 
---> 59         return max(matching_endings, key=len)
     60 
     61 

ValueError: max() arg is an empty sequence

Probably because there is no ending like that in the topodictionary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.