Code Monkey home page Code Monkey logo

pywords's People

Contributors

lumip avatar

Watchers

 avatar  avatar

pywords's Issues

Implement Normalization Routine for TransformationSequence

The general idea of a transformation sequence is to contain several EditTransformation each with a pattern that will be replaced (edited) and a prefix. However, some of these EditTransformations may end up not having a prefix or replacement pattern. These could be absorbed by following or preceeding EditTransformations.
Implement a normalization routine for TransformationSequence which enforces that each contained EditTransformation has a prefix and a replacement pattern (with the exception of the first (no prefix required) and last one (no replacement required if it just skips towards the end)).

Complete Test Cases for HangeulDecomposer Class

Current test cases for this class are quite limited and only cover that it behaves correctly in composing/decomposing actual composed syllables/runes. However, it should also deal gracefully with non-composed or non-hangeul character (i.e., leave them unchanged). This must be confirmed by tests.

Is Always Prioritizing Delete Steps in Rule Inferring A Good Idea?

When infering the specific transformation rules for a training instance, in the common subsequence extraction algorithm, delete steps are (somewhat arbitrarily) favored over insert or edit steps when a score draw existed, as they gave more desirable results in instances seen so far.
E.g.
This splits "liegen" and "gelegen" into
| |l|i|egen|

|ge|l| |egen|
instead of
| li|egen|

|gel|egen|

It is not clear, however, that this will always be the case. Some thinking is needed here.
If the above really is preferrable, should the costs for steps in the lcs computation be adjusted (instead of imposing an order on how to deal with draws)?

Employ ML to Infer Which Words Adhere to Similar Rules

The implementation of the WordTransformation rules classes is so that the system first infers specific rules for each verbs and then tries to generalize these rules so that the same rule applies to several seen training instances, which are then deemed as equivalent/adhering to the same rule(s). This has been hardcoded so that it "works sufficiently well".

Can some machine learning approach be used to better infer which training instances actually adhere to same rules by just seeing the specific rules directly infered from the instances?

Consider Implementing Custom Decision Tree Learning Procedure

The project currently relies on the sklearn library for decision tree learning. However, as that is a general solution, it is not optimized for the very discrete variables in play here. It also does not support incremental learning/growing of the decision tree, which might be a desirable feature in the future.
Hence, a custom implementation of decision tree learning might be beneficial.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.