jankrepl / mltype Goto Github PK
View Code? Open in Web Editor NEWCommand line tool for improving typing skills (programmers friendly)
Home Page: https://mltype.readthedocs.io
License: MIT License
Command line tool for improving typing skills (programmers friendly)
Home Page: https://mltype.readthedocs.io
License: MIT License
Testing curses
might be tricky though
It is necessary to provide some nice pretrained models
Ideally, one could just write
mlt download model_name url
and it would do it automatically.
Not everybody wants to use it for logging and additionally it does not seem to be well supporeted on Windows. Ideally,
one would specify via mlt train
option if they need to use it.
Make sure that eval
mode is active when we run inference.
It is created whenever training
Currently, the statistics are very basic
It might be cool to add some more indicators
bashplotlib
?)In other words mlt train ... -t 1
Probably just another entry in the pickled dictionary. Additionally, one should make this description visible in mlt list
. Finally, one should be able to optionally provide this description in mlt train
.
Would be nice if we could autocomplete the names of the existing models in mlt sample
mlt train
does not allow to overwrite.
The first time trying to run mlt sample
in a fresh venv takes 20 seconds. What is extremely confusing is that there is no progress bar and it just seems to be hanging.
One can easily check whether the vocabularies agree in mlt train
When creating the one hot encoding of characters we copy. However, there is no need to do that. But maybe it is good thing if somebody tries to manually changing the features.
For some reason, the mlflow mlruns
folder is generated just by running mlt
.
One could control things that are not controllable via CLI:
Currently, the only metric is the loss itself and one has no idea how good the model is until the training is done and one can mlt sample
. However, one can sample a few texts at the end of each epoch and store them as mlflow artifacts.
Currently, it takes around 3 seconds to load. Note that it might be related to #4
sample_char
and sample_text
are broken when network parameters lie on a GPUload_model
that we want to do CPU inferenceIt seems like most of the time is spent on importing pytorch_lightning
and then mlflow
.
Currently, the sample_text
function requires a window_size
since for each new character it starts from scratch - the complexity is O(window_size). However, correctly, we should also return the hidden states of lstm and then prediction of the next character is O(1). There are 3 benefits
window_size
especially)window_size
hyperparamter at inference timeCurrently the train test split might be totally separate files
Since we use pytorch-lightning
it should be trivial.
Currently, we just readlines, strip and then join them all on spaces.
With the current setup.py
one gets
ERROR: pytorch-lightning 1.0.2 has requirement future>=0.17.1, but you'll have future 0.16.0 which is incompatible.
Ideally at install time.
Currently, it is not very readable
It seems to be too low when making a mistake
Would be really cool for languages like Python and any other programming language.
Note sure what is done by curses currently when it is asked to addch a new liine
in all of the below cases:
np.bool
is actually taking one byte. So if we use a pure bitarray we can fit 8 times more samples in RAM. Also would be nice to investigae sparse matrices. But the simplest is to just store integers representing the positions and then only create the features when in dataloader __getitem__
Whenever we race an opponent the whole screen waits for an input character and only then it refreshes.
It only updates it once we type the character
Currently, ENTER is not seen as a special character and one might get confused at the end of the line. Why? Pressing enter will result in an error (unless ENTER in the dictionary).
Currently, a lot of the trained models contain characters that are not even available on the english keyboard:D
mlt train .... -f "~$@"
could explicitly disallow characters when training.
By default it is ~/.mltype
. However, google colab does not display hidden folders which makes it impossible to see what is going in during the training.
Legacy code
If the terminal is setup in this way, competing against some opponent is way more visible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.