Code Monkey home page Code Monkey logo

lemminflect's People

Contributors

bjascob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lemminflect's Issues

Pronouns support

Please add pronouns support.
lemminflect.getAllLemmas('his') returns {}

Doc Enhancement

spacy.load("en_core_web_sm")(word)[0]._.inflect("NNS")

gives me "Can't retrieve unregistered extension attribute 'inflect'. Did you forget to call the set_extension method?".

If I follow the message and add "spacy.tokens.Token.set_extension('inflect', method=Inflections().spacyGetInfl)", I now get " Extension 'inflect' already exists on Token. To overwrite the existing extension, set force=True on Token.set_extension"

If I add force=True, it gives me what I want, but there's no mention of this in the tests.

If you'd like, I can add a test to this extent or a note in the README.

Contractions not in the lookup

Contracts are not in the dictionary lookups
They show up in the LEXICON and in the english_dict.txt but not the forms_table.csv.gz. Likely they are being eliminated by the ASCII checks and shouldn't be.
words = ["'d", "'ll", "'m", "'re", "'s", "'ve"]
words = [would, will, am, are, is, have]
lemmas = [will, will, be, be, be, have

Word casing not preserved in all cases

LemmInflect currently preserves casing for "all lower", "all upper" and "first upper" casing styles. For words like "McDonald", after lemma/inflection the returned word will be "Mcdonald" since capitalization of individual letters is not maintained.

Incorrect base form for "install"

Is it appropriate to report single words that are incorrect as bugs? I realize the dictionary can't be 100% complete.

Lemminflect 0.2.1:
getAllLemmas('install')
{'VERB': ('install', 'instal')}

I think it should just return 'install'.

Lemma model can select rule for wrong pos type

For the test case 'quilting/NOUN' and 'plastering/NOUN', the words are not in the lemma lookup so OOV rules are called.

getAllLemmasOOV('quilting`, 'NOUN')` returns 'quilt' (it selects rule "ing,,False")
getAllLemmasOOV('plastering`, 'NOUN') returns 'plastering' (it selects rule ",,False")

In the case of 'quilting' the model selects a verb rule. To prevent this consider...

  • Add hard-coded rules to choose the next best if the rule doesn't apply
  • Split the model into 3 parts (verb, noun, adj/adv) and run separately
  • Add contra-cases to training data so it learns not to do this

In addition, the model classes include the ending letters to remove. However, similar above, there is nothing to prevent it selecting a "remove ing" rule for a word ending in something else. I'm not aware of this causing issues but it should be investigated when looking into the first issue.

Ability to find base word without knowing the POS tag

There are some use cases where users would like to find the base word (aka lemma) but don't know what part-of-speech the word is. This is problematic for words like "painting" which could either be "paint" for a verb or "painting" for a noun. Regardless, it may be useful to simply return "paint" for use in Neural Network sentence classification, etc..

Proposed approach is to use the dictionary to find the shortest word. If the word is not in the dictionary then try OOV for Nouns and Verbs and choose the shortest.

inflection tool for other languages

Hi,

Thanks for providing this library.
Have you ever thought of implementing other languages?
If not, I'm currently looking for a tool that does Russian word inflection and I was wondering whether you know of any such tools?

Best,
Eva

Incorrect inflections of special adjectives like beautiful and handsome

Hi, thanks for building this amazing tool!
Currently, it doesn't seem to handle inflections of special adjectives like beautiful and handsome correctly.

Example:

from lemminflect import getLemma, getInflection

lemma = getLemma('beautiful', upos='ADJ')
inflection1 = getInflection(lemma[0], tag='JJR')
inflection2 = getInflection(lemma[0], tag='JJS')
print(inflection1, inflection2)

gives ('beautifuler',) and ('beautifulest',). It'd be great if lemminflect can output something like ('more', 'beautiful',) or ('more beautiful',)!

"Haves"

>>> import lemminflect as li
>>> li.getInflection('have', 'VBZ')
('haves',)

Shouldn't that be has? What am I doing wrong?

Make spaCy integration explicit

import spacy
import lemminflect

results in the last import being reported as unused by linters.

import spacy
import lemminflect

lemminflect.extend_spacy()

or something similar would've been much better.

Incorrect inflections

['somewhat', ####], ['somew', 'ADJ']

[['his', ####], ['hi', 'PROPN'], ['hi', 'ADJ'], ['hi', 'ADV']],

['her', ####], ['he', 'ADJ'], ['h', 'ADV']],

[['could', ####], ['coul', 'ADV']

[['another', ####], ['anoth', 'ADJ'], ['anoth', 'ADV']],

[['question', ####], ['quest', 'ADV']],

[['vs', ####], ['v', 'NOUN'], ['v', 'PROPN'], ['v', 'VERB'], ['v', 'ADJ'], ['v', 'ADV']],

Citation

Hi, thanks for the amazing tool!

I'm using lemminflect in my project at uni (which may become a paper later). Do you have a paper on it which I could cite?

Converting between adverbs and adjectives

When using getAllInflections() on some adjective, the adverb forms given are the exact same as the adjective and vice versa. Is there a possible fix for this? I'm trying to generate the adjective form of adverbs and vice versa

Extension to other languages

Is it possible to extend the functionality of the inflection to other languages, and if so, what would be required to do so?

Derivational Morphemes

Hi all,

It would be great to have arrival from arrive. I checked the following function, but it did not return arrival

getAllInflections('arrive')
{'VBD': ('arrived',), 'VBG': ('arriving',), 'VBZ': ('arrives',), 'VB': ('arrive',), 'VBP': ('arrive',)}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.