jankrepl / mltype Goto Github PK

View Code? Open in Web Editor NEW

440.0 9.0 29.0 121 KB

Command line tool for improving typing skills (programmers friendly)

Home Page: https://mltype.readthedocs.io

License: MIT License

Python 99.90% Dockerfile 0.10%

typing-practice typing typingspeedtest typing-trainer machine-learning deep-learning touch-typing

mltype's People

Contributors

Stargazers

Watchers

mltype's Issues

Provide pretrained models

It is necessary to provide some nice pretrained models

English - trained on wikipedia
Python - trained on some opensource library

Ideally, one could just write
mlt download model_name url and it would do it automatically.

Make mlflow an optional dependency

Not everybody wants to use it for logging and additionally it does not seem to be well supporeted on Windows. Ideally,
one would specify via mlt train option if they need to use it.

Make sure sampling done in eval mode

Make sure that eval mode is active when we run inference.

Improve the statistics

Currently, the statistics are very basic

WPM
Accuracy

It might be cool to add some more indicators

Keystrokes (see 10fastfingers)
speed evolution over the text (like typeracer) - plotting might be tricky (bashplotlib ?)
Average speed per character (one needs to condense this information)

Allow for training without validation set

In other words mlt train ... -t 1

Description for saved dictionary

Probably just another entry in the pickled dictionary. Additionally, one should make this description visible in mlt list. Finally, one should be able to optionally provide this description in mlt train.

Bash autocomplete

Would be nice if we could autocomplete the names of the existing models in mlt sample

Prompt when trying to overwrite existing model

mlt train does not allow to overwrite.

mlt sample slow in fresh venv

The first time trying to run mlt sample in a fresh venv takes 20 seconds. What is extremely confusing is that there is no progress bar and it just seems to be hanging.

Training continuation

One can easily check whether the vocabularies agree in mlt train

No need for copying when creating features

When creating the one hot encoding of characters we copy. However, there is no need to do that. But maybe it is good thing if somebody tries to manually changing the features.

CLI creates mlruns folder

For some reason, the mlflow mlruns folder is generated just by running mlt.

Add config file

One could control things that are not controllable via CLI:

colors

Generate text samples at epoch end

Currently, the only metric is the loss itself and one has no idea how good the model is until the training is done and one can mlt sample. However, one can sample a few texts at the end of each epoch and store them as mlflow artifacts.

Speed up intro CLI

Currently, it takes around 3 seconds to load. Note that it might be related to #4

Test CLI

Fix multidevice issues

The sample_char and sample_text are broken when network parameters lie on a GPU
Make it explicit in load_model that we want to do CPU inference

Make mlt sample faster

It seems like most of the time is spent on importing pytorch_lightning and then mlflow.

Unroll LSTM correctly

Currently, the sample_text function requires a window_size since for each new character it starts from scratch - the complexity is O(window_size). However, correctly, we should also return the hidden states of lstm and then prediction of the next character is O(1). There are 3 benefits

speedup (for large window_size especially)
long memory - dah
we remove the window_size hyperparamter at inference time

Test curses

Train test split shuffle

Currently the train test split might be totally separate files

Add GPU support

Since we use pytorch-lightning it should be trivial.

Better text file parsing

Currently, we just readlines, strip and then join them all on spaces.

Print mlflow run id if active

Future: dependency conflict

With the current setup.py one gets

ERROR: pytorch-lightning 1.0.2 has requirement future>=0.17.1, but you'll have future 0.16.0 which is incompatible.

Make compatible with Pytorch-lightning 1.x.x

Create cache folder early

Ideally at install time.

Nicer formatting of mlt list

Currently, it is not very readable

Revisit accuracy calculation

It seems to be too low when making a mistake

Allow newline to be viewable

Would be really cool for languages like Python and any other programming language.

Note sure what is done by curses currently when it is asked to addch a new liine

Checkpoint the best model

in all of the below cases:

standard end (e = max_epochs)
early end
keyboard interrupt

Find alternatives to np.bool (including sparse)

np.bool is actually taking one byte. So if we use a pure bitarray we can fit 8 times more samples in RAM. Also would be nice to investigae sparse matrices. But the simplest is to just store integers representing the positions and then only create the features when in dataloader __getitem__

https://stackoverflow.com/questions/5602155/numpy-boolean-array-with-1-bit-entries#:~:text=The%20standard%20np.,8%20times%20the%20required%20memory.

jankrepl / mltype Goto Github PK

mltype's People

Contributors

Stargazers

Watchers

Forkers

mltype's Issues

Recommend Projects

Recommend Topics

Recommend Org