lumip / pywords Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 51 KB

License: GNU Affero General Public License v3.0

Python 100.00%

pywords's People

Contributors

Watchers

pywords's Issues

Implement Normalization Routine for TransformationSequence

The general idea of a transformation sequence is to contain several EditTransformation each with a pattern that will be replaced (edited) and a prefix. However, some of these EditTransformations may end up not having a prefix or replacement pattern. These could be absorbed by following or preceeding EditTransformations.
Implement a normalization routine for TransformationSequence which enforces that each contained EditTransformation has a prefix and a replacement pattern (with the exception of the first (no prefix required) and last one (no replacement required if it just skips towards the end)).

Complete Test Cases for HangeulDecomposer Class

Current test cases for this class are quite limited and only cover that it behaves correctly in composing/decomposing actual composed syllables/runes. However, it should also deal gracefully with non-composed or non-hangeul character (i.e., leave them unchanged). This must be confirmed by tests.

Is Always Prioritizing Delete Steps in Rule Inferring A Good Idea?

When infering the specific transformation rules for a training instance, in the common subsequence extraction algorithm, delete steps are (somewhat arbitrarily) favored over insert or edit steps when a score draw existed, as they gave more desirable results in instances seen so far.
E.g.
This splits "liegen" and "gelegen" into
| |l|i|egen|

|ge|l| |egen|
instead of
| li|egen|

|gel|egen|

It is not clear, however, that this will always be the case. Some thinking is needed here.
If the above really is preferrable, should the costs for steps in the lcs computation be adjusted (instead of imposing an order on how to deal with draws)?

Implement Incremental Tree Training

Employ ML to Infer Which Words Adhere to Similar Rules

The implementation of the WordTransformation rules classes is so that the system first infers specific rules for each verbs and then tries to generalize these rules so that the same rule applies to several seen training instances, which are then deemed as equivalent/adhering to the same rule(s). This has been hardcoded so that it "works sufficiently well".

Can some machine learning approach be used to better infer which training instances actually adhere to same rules by just seeing the specific rules directly infered from the instances?

Consider Implementing Custom Decision Tree Learning Procedure

The project currently relies on the sklearn library for decision tree learning. However, as that is a general solution, it is not optimized for the very discrete variables in play here. It also does not support incremental learning/growing of the decision tree, which might be a desirable feature in the future.
Hence, a custom implementation of decision tree learning might be beneficial.

lumip / pywords Goto Github PK

pywords's People

Contributors

Watchers

pywords's Issues

Implement Normalization Routine for TransformationSequence

Complete Test Cases for HangeulDecomposer Class

Is Always Prioritizing Delete Steps in Rule Inferring A Good Idea?

Implement Incremental Tree Training

Employ ML to Infer Which Words Adhere to Similar Rules

Consider Implementing Custom Decision Tree Learning Procedure

Implement Symmetric Test Case to test_maybe_joinable_single_elements of TransformationSequenceTests in EditTransformationTests

Complete Test Cases for ClusterSet

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent