c0dearm / mushin Goto Github PK

Computational graphs with reverse automatic differentation in the GPU

License: Other

Rust 100.00%

rust auto-differentiation deep-learning machine-learning

mushin's Introduction

Mushin is a Japanese term used in martial arts that refers to the state of mind obtained by practice. At this point, a person relies not on what they think should be the next move, but what is their trained natural reaction (or instinct).

Description

Mushin is a pure Rust, no-unsafe library for computing gradients on dynamic computational graphs using reverse automatic differentiation. In other words, what PyTorch is to Python is what Mushin is to Rust.

Internally it uses the arrayfire crate to provide parallel computations on specialized hardware, such as Nvidia CUDA GPUs, Intel MKL CPUs... For details on what devices are available and installation instructions for your OS, please take a look at the arrayfire crate documentation. The installation of the arrayfire binaries is required for Mushin to work.

One clear benefit of this crate versus PyTorch is Rust's strong type system. All operations performed on tensors during the graph build are checked at compile time for mathematical soundness, which means no runtime error after an hour of model training. If it compiles, it works. If at some point you make a mistake while building your made in hell nested computational graph, like for example on the shape of a tensor, you'll be stopped even before you can start feeling stupid.

Moreover, because constant and variable tensors are actually different types, the developer continuously has an overview of which resulting tensors contribute to the gradients and which not. On top of that, the compiler will stop you from trying to compute the gradient of or with respect to a constant!

Another benefit when compared to other similar libraries is that the computation graph is eagerly evaluated, which means that the graph is trully dynamic. In other words, your next operations can be conditioned to the results of previous ones, and so you can have conditional branching while building your graph.

Usage

First, install the arrayfire binaries as indicated by the arrayfire crate.

Then, add mushin as one of your dependencies:

[dependencies]
mushin = "0.5"

The following is quite a self-explanatory example of the basic usage of Mushin to build computation graphs and get the derivatives back:

use mushin as mu;

fn main() {
    let x = mu::eye::<1, 1, 2, 3>(3.0).freeze();
    let w = mu::randn::<1, 1, 3, 2>();
    let b = mu::fill::<1, 1, 3, 3>(0.0);

    let z = mu::add(&mu::mm(&w, &x), &b);
    z.backward();

    let dz_dw = w.grad()
    let dz_db = b.grad()
}

By default, this library enables the nn feature that gives access to the nn module, which builds upon the auto-grad foundation of Mushin to deliver a set of Deep Learning utilities, such as activation functions, layers, losses and optimizers. If you don't really need that part and you are only insterested in the pure auto-grad functionality of this library, the nn module can be disabled with default-features = false. Here follows a brief example on how it works:

use mushin as mu;
use mu::nn::{layers::Linear, activations::relu, losses::mse, optimizers::SGD};

let x = mu::eye::<16, 1, 1, 3>(1.0).freeze();
let y = mu::eye::<16, 1, 1, 5>(3.0).freeze();

let linear = Linear::<16, 3, 5>::new();
let optim = SGD::new(&[linear.parameters()], 0.01);

for _ in 0..5 {
    let z = relu(&linear.forward(&x));
    let loss = mse(&z, &y);
    
    loss.backward();
    optim.step();
    loss.reset();
}

Contributing

If you find a vulnerability, bug or miss something, please open a new issue
To introduce your changes into the codebase, submit a pull request
To discuss about possible improvements, suggestions and new fearures, join us in Slack!

Many thanks!

License

Mushin is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT, and COPYRIGHT for details.

mushin's People

Contributors

Stargazers

Watchers

Forkers

yonipeleg33 tuckerbmorgan isgasho sktt1ryze tiamat-tech apepkuss chaosstudygroup austinhellerrepo iq-scm

mushin's Issues

Implementing the softmax activation function

I saw that I will need the softmax activation function for my basic word2vec neural network. The thing is the softmax activation function needs the whole list of output values and the Activation trait does only know about the current output value. I would say that I must change the trait to give the whole list of values somehow but I find it not performant enough if I continue to call it on each output value, computing the exponential sum every-time.

ML examples

Hi there!

This is a really cool library and I'm excited to try it out (especially after spending several hours fighting my environment to get tch-rs working >_> - still haven't, suspect it's easier to rewrite in mushin instead!)

That being said, I'm still a bit of a novice at ML, and would appreciate some applied examples to see how you might port e.g. one of the PyTorch examples over. (I'm particularly interested in making a GAN or a VAE, but don't let that influence your choice!)

No rush on this, but it'd be nice to have, especially for other people who'd like to better understand how to use the library 🙂

(Unrelated: is there any kind of roadmap for what you're planning on implementing next / would like to see implemented?)

Create discussion forum / discord / matrix

Hi! Would it be possible to create a forum to discuss about the API and the philosophy of this project? The idea of this project is really nice, but there may be people who want to contribute with code or with their opinions on the current state of the project and its future.

This way it'll be much easier to discuss about what kind of API works best for the general consumers of the library.

Implementing backpropagation

Hey,

I wanted to implement a simple and basic word2vec neural network in Rust using a skip-gram model, by hand. I remembered your library from a Reddit post and thought about maybe implementing backpropagation in your library. Can you give me some help on where and how I should implement it? I will probably follow a basic python tutorial on how it works.

Have a good day!

Note for followers!

Mushin is undergoing a major refactor, in order to provide a better API (experience) to users creating their own Deep Learning models. During the implementation of the MNIST digit recognition example I found some difficulties to ergonomically express the convolutional model. You might not see commits often until I iterate over a few different implementation possibilities. Until then, just letting you know I am working on it!

Thank you all for your patience ❤️

Making Arrayfire optional

Do you think it would be possible to make arrayfire optional? The benefits of arrayfire are enormous, but it also seems like a huge handicap that anyone using the crate will be forced to download and install the arrayfire binaries. This also means that if I use mushin in my crate, my crate will also dependent on arrayfire, which is kind of a turn off.

I guess making it optional would require making several versions of all operations and complicating the code with some abstraction over arrayfire/alternative, which would be a lot of work.

RUSTSEC-2020-0071: Potential segfault in the time crate

Potential segfault in the time crate

Details
Package	`time`
Version	`0.1.44`
URL	time-rs/time#293
Date	2020-11-18
Patched versions	`>=0.2.23`
Unaffected versions	`=0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6`

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

The affected functions from time 0.2.7 through 0.2.22 are:

time::UtcOffset::local_offset_at
time::UtcOffset::try_local_offset_at
time::UtcOffset::current_local_offset
time::UtcOffset::try_current_local_offset
time::OffsetDateTime::now_local
time::OffsetDateTime::try_now_local

The affected functions in time 0.1 (all versions) are:

at
at_utc
now

Non-Unix targets (including Windows and wasm) are unaffected.

Patches

Pending a proper fix, the internal method that determines the local offset has been modified to always return None on the affected operating systems. This has the effect of returning an Err on the try_* methods and UTC on the non-try_* methods.

Users and library authors with time in their dependency tree should perform cargo update, which will pull in the updated, unaffected code.

Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3 series.

Workarounds

No workarounds are known.

References

time-rs/time#293

See advisory page for additional details.