lumip / pywords Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU Affero General Public License v3.0
License: GNU Affero General Public License v3.0
The general idea of a transformation sequence is to contain several EditTransformation each with a pattern that will be replaced (edited) and a prefix. However, some of these EditTransformations may end up not having a prefix or replacement pattern. These could be absorbed by following or preceeding EditTransformations.
Implement a normalization routine for TransformationSequence which enforces that each contained EditTransformation has a prefix and a replacement pattern (with the exception of the first (no prefix required) and last one (no replacement required if it just skips towards the end)).
Current test cases for this class are quite limited and only cover that it behaves correctly in composing/decomposing actual composed syllables/runes. However, it should also deal gracefully with non-composed or non-hangeul character (i.e., leave them unchanged). This must be confirmed by tests.
When infering the specific transformation rules for a training instance, in the common subsequence extraction algorithm, delete steps are (somewhat arbitrarily) favored over insert or edit steps when a score draw existed, as they gave more desirable results in instances seen so far.
E.g.
This splits "liegen" and "gelegen" into
| |l|i|egen|
|ge|l| |egen|
instead of
| li|egen|
|gel|egen|
It is not clear, however, that this will always be the case. Some thinking is needed here.
If the above really is preferrable, should the costs for steps in the lcs computation be adjusted (instead of imposing an order on how to deal with draws)?
The implementation of the WordTransformation rules classes is so that the system first infers specific rules for each verbs and then tries to generalize these rules so that the same rule applies to several seen training instances, which are then deemed as equivalent/adhering to the same rule(s). This has been hardcoded so that it "works sufficiently well".
Can some machine learning approach be used to better infer which training instances actually adhere to same rules by just seeing the specific rules directly infered from the instances?
The project currently relies on the sklearn library for decision tree learning. However, as that is a general solution, it is not optimized for the very discrete variables in play here. It also does not support incremental learning/growing of the decision tree, which might be a desirable feature in the future.
Hence, a custom implementation of decision tree learning might be beneficial.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.