Code Monkey home page Code Monkey logo

mygrad's Introduction

Tested with Hypothesis codecov Documentation Status Automated tests status PyPi version Python version support

Introducing mygrad

MyGrad is a lightweight library that adds automatic differentiation to NumPy โ€“ its only dependency is NumPy! It is specifically able to compute gradients of scalar-valued functions via backpropagation (i.e. reverse-mode automatic differentiation).

>>> import mygrad as mg
>>> import numpy as np

>>> x = mg.tensor([1., 2., 3.])  # like numpy.array, but supports backprop!
>>> f = np.sum(x * x)  # tensors work with numpy functions!
>>> f.backward() # triggers automatic differentiation 
>>> x.grad  # stores [df/dx0, df/dx1, df/dx2]
array([2., 4., 6.])

MyGrad's primary goal is to make automatic differentiation accessible and easy to use across the Python/NumPy ecosystem. As such, it strives to behave and feel exactly like NumPy so that users need not learn yet another array-based math library. Of the various modes and flavors of auto-diff, MyGrad supports backpropagation from a scalar quantity.

Installing MyGrad:

pip install mygrad

NumPy's ufuncs are richly supported; e.g. we can autodiff through in-place targets and boolean masks:

>>> x = mg.tensor([1., 2., 3.])
>>> y = mg.zeros_like(x)
>>> np.multiply(x, x, where=[True, False, True], out=y)
>>> y.backward()
>>> x.grad
array([2., 0., 6.])

NumPy's view semantics are also mirrored to a high fidelity

>>> x = mg.arange(9.).reshape(3, 3)
>>> diag_view = np.einsum("ii->i", x)
>>> x, diag_view
(Tensor([[0., 1., 2.],
         [3., 4., 5.],
         [6., 7., 8.]]),
 Tensor([0., 4., 8.]))

# views share memory
>>> np.shares_memory(x, diag_view)
True

# mutating a view affects its base (and all other views)
>>> diag_view *= -1  # mutates x in-place
>>> x
Tensor([[-0.,  1.,  2.],
        [ 3., -4.,  5.],
        [ 6.,  7., -8.]])

>>> (x ** 2).backward()
>>> x.grad, diag_view.grad
(array([[ -0.,   2.,   4.],
        [  6.,  -8.,  10.],
        [ 12.,  14., -16.]]),
 array([ -0.,  -8., -16.]))

# the gradients have the same view relationship!
>>> np.shares_memory(x.grad, diag_view.grad)
True

Basic and advanced indexing is fully supported

>>> (x[x < 4] ** 2).backward()
>>> x.grad
array([[0., 2., 4.],
       [6., 0., 0.],
       [0., 0., 0.]])

NumPy arrays and other array-likes play nicely with MyGrad's tensor. These behave like constants during automatic differentiation

>>> x = mg.tensor([1., 2., 3.])
>>> y = np.array([-1., 0., 10])
>>> (x * y).backward()  # y is treated as a constant
>>> x.grad
array([-1.,  0., 10.])

mygrad.nnet supplies essential functions to facilitate typical machine learning examples:

Advanced Example

The following is an example of using mygrad to compute the hinge loss of classification scores and to "backpropagate" through (compute the gradient of) this loss. This example demonstrates some of mygrad's ability to perform backpropagation through broadcasted operations, basic indexing, advanced indexing, and in-place assignments.

>>> import mygrad as mg
>>> import numpy as np
>>> class_scores = 10 * mg.random.rand(100, 10) # 100 samples, 10 possible classes for each
>>> class_labels = np.random.randint(low=0, high=10, size=100)  # correct label for each datum
>>> class_labels = (range(len(class_labels)), class_labels)
>>> correct_class_scores = class_scores[class_labels]

>>> Lij = class_scores - correct_class_scores[:, np.newaxis] + 1.  # 100x10 margins
>>> Lij[Lij <= 0] = 0      # scores within the hinge incur no loss
>>> Lij[class_labels] = 0  # the score corresponding to the correct label incurs no loss

>>> loss = Lij.sum() / class_scores.shape[0]  # compute mean hinge loss
>>> loss.backward()    # compute gradient of loss w.r.t all dependent tensors
>>> class_scores.grad  # d(loss)/d(class_scores)
array([[ 0.  ,  0.01,  0.  , -0.04,  0.  ,  0.  ,  0.01,  0.  ,  0.01, 0.01], ...])

Computational Graph Visualization

mygrad uses Graphviz and a Python interface for Graphviz to render the computational graphs built using tensors. These graphs can be rendered in Jupyter notebooks, allowing for quick checks of graph structure, or can be saved to file for later reference.

The dependencies can be installed with:

conda install graphviz
conda install python-graphviz

mygrad's People

Contributors

afederici avatar aslvrstn avatar darshankrishnaswamy avatar davidmascharka avatar dependabot[bot] avatar kw-0 avatar mkhan45 avatar nickstanisha avatar petarmhg avatar rsokl avatar samaocarpenter avatar zac-hd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mygrad's Issues

argmax/argmin

They should exist as non-differentiable functions in Tensor. These should merely be wrappers on the numpy functions so that they can also accept tensors. These functions should return numpy arrays.

updated choices hypothesis strategy

def choices(seq, size, replace=True):
    """Randomly choose elements from `seq`, producing a tuple of length `size`."""
    if size > len(seq) and not replace:
        raise ValueError("`size` must not exceed the length of `seq` when `replace` is `False`")
    if size > len(seq) and not seq:
        raise ValueError("`size` must be 0, given an empty `seq`")
    inds = list(range(len(seq)))
    if replace:
        strat = st.tuples(*[st.sampled_from(inds)]*size)
    else:
        strat = st.permutations(inds)
    return strat.map(lambda x: tuple(seq[i] for i in x[:size]))

hype train

@nickstanisha

Check it!

def sliding_window_view(arr, window_shape, steps):
    """ Produce a view from a sliding, striding window over `arr`.
        The window is only placed in 'valid' positions - no overlapping
        over the boundary.

        Parameters
        ----------
        arr : numpy.ndarray, shape=(...,[x, (...), z])
            The array to slide the window over.

        window_shape : Sequence[int]
            The shape of the window to raster: [Wx, (...), Wz],
            determines the shape of [x, (...), z]

        steps : Sequence[int]
            The step size used when applying the window
            along the [x, (...), z] directions: [Sx, (...), Sz]

        Returns
        -------
        view of `arr`, shape=([X, (...), Z], ..., [Sx, (...), Sz])
            Where X = (x - Wx) // Sx + 1

        Notes
        -----
        In general, given
          `out` = sliding_window_view(arr,
                                      window_shape=[Wx, (...), Wz],
                                      steps=[Sx, (...), Sz])

           out[ix, (...), iz] = arr[..., ix*Sx:ix*Sx+Wx,  (...), iz*Sz:iz*Sz+Wz]

         Examples
         --------
         >>> import numpy as np
         >>> x = np.arange(9).reshape(3,3)
         >>> x
         array([[0, 1, 2],
                [3, 4, 5],
                [6, 7, 8]])

         >>> y = sliding_window_view(x, window_shape=(2, 2), steps=(1, 1))
         >>> y
         array([[[[0, 1],
                  [3, 4]],

                 [[1, 2],
                  [4, 5]]],


                [[[3, 4],
                  [6, 7]],

                 [[4, 5],
                  [7, 8]]]])
        >>> np.shares_memory(x, y)
         True

        # Performing a neural net style 2D conv (correlation)
        # placing a 4x4 filter with stride-1
        >>> data = np.random.rand(10, 3, 16, 16)  # (N, C, H, W)
        >>> filters = np.random.rand(5, 3, 4, 4)  # (F, C, Hf, Wf)
        >>> windowed_data = sliding_window_view(data,
        ...                                     window_shape=(4, 4),
        ...                                     steps=(1, 1))

        >>> conv_out = np.tensordot(filters,
        ...                         windowed_data,
        ...                         axes=[[1,2,3], [3,4,5]])

        # (F, H', W', N) -> (N, F, H', W')
        >>> conv_out = conv_out.transpose([3,0,1,2])
         """
    import numpy as np
    from numpy.lib.stride_tricks import as_strided
    in_shape = np.array(arr.shape[-len(steps):])  # [x, (...), z]
    window_shape = np.array(window_shape)  # [Wx, (...), Wz]
    steps = np.array(steps)  # [Sx, (...), Sz]
    nbytes = arr.strides[-1]  # size (bytes) of an element in `arr`

    # number of per-byte steps to take to fill window
    window_strides = tuple(np.cumprod(arr.shape[:0:-1])[::-1]) + (1,)
    # number of per-byte steps to take to place window
    step_strides = tuple(window_strides[-len(steps):] * steps)
    # number of bytes to step to populate sliding window view
    strides = tuple(int(i) * nbytes for i in step_strides + window_strides)

    outshape = tuple((in_shape - window_shape) // steps + 1)
    # outshape: ([X, (...), Z], ..., [Sx, (...), Sz])
    outshape = outshape + arr.shape[:-len(steps)] + tuple(window_shape)
    return as_strided(arr, shape=outshape, strides=strides, writeable=False)

Graph Diagrams

Add a construct_graph function to Tensors that creates a visual graph of a network.

Example:

"
 /โ€พโ€พโ€พ\
| out |
 \___/
   ฮ›
   |
   |
   |
 |โ€พโ€พโ€พ|
 | + |
 |___|
   ฮ›
   |
   |โ€พโ€พโ€พโ€พโ€พโ€พโ€พ|
   |       |
   |       |
 /โ€พโ€พโ€พ\   /โ€พโ€พโ€พ\
|  Z  | |  K  |
 \___/   \___/
           ฮ›
           |
           |
           |
         |โ€พโ€พโ€พ|
         | * |
         |___|
           ฮ›
           |
      |โ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พโ€พ|
      |         |
      |         |
    /โ€พโ€พโ€พ\    /โ€พโ€พโ€พ\
   |  X  |  |  Y  |
    \___/    \___/
"

Create documentation for MyGrad

I have begun working on using sphinx to create a docs page for mygrad. Progress is being made on the docs

To-do

  • Create a docs README.md so people know how to build the docs
  • Add example section to every function's docstring (in-progress)
  • A basic introduction to MyGrad, including its purpose to be a simple, but optimized autograd library.
  • Demonstrate the basic usage of mygrad.Tensor, and its numpy-array-esque behavior (including basic and advanced indexing)
  • Summarize the library's neural network tools and discuss MyNN
  • Demonstrate ability to visualize computational graph
  • Provide simple gradient descent example
  • Document how MyGrad's back-propagation system works
  • Add Latex equations for neural network functions (multiclass hinge still needs docs)

Bitwise Operators

The bitwise operators should be supported for Tensors. These are

  • & (bitwise and)
  • | (bitwise or)
  • ^ (bitwise xor)
  • ~ (twos-complement) [maybe support?]

Define InvalidBackprop exception

Currently Tensor.backprop and Operation.backprop raise a bare Exception for when raising for invalid backprop states. We should raise a more descriptive exceptions so that people don't have to do a bare try-except to catch these.

Tensor methods

I want to start discussing how we should handle the structuring of the Tensor class. There are core methods like _op and __init__, that obviously need to be explicitly defined on Tensor, and then there are a whole slew of methods, like sum, reshape, and many others, which are also available as standalone functions, but also need to be bound to the array.

Should Tensor only include explicit definitions for its essential methods, and then we provide an automated way of binding those auxiliary methods to it upon construction?

Something along these lines?

class Tensor
    def __init__(self):
         ...
    def _op(self):
        ...

for op in [sum, reshape, ...]:
    setattribute(Tensor, op.__name__, op)

The point here is to make sure Tensor is manageable and readable to users. This also reduces manual labor and the chance for mistakes when mirroring new ops as tensor methods.

I've tried to see how numpy handles this for their arrays, but I inevitably get caught in the eternal cycle of:

from .array import _array

def array(arr):
    return _array(arr)

and I can't figure out where anything actually gets created ๐Ÿ˜ฃ

Add import tests

from typing import Tuple
from unittest import mock
import sys
from importlib import import_module
from pkgutil import walk_packages

import pytest


def get_submodule_names(package_name: str) -> Tuple[str]:
    """ Get the names of all submodules of a module, recursively

    Parameters
    ----------
    package_name: str

    Return
    ------
    Tuple[str]
        The names of the module and its submodules
    """
    import_module(package_name)
    package = sys.modules[package_name]
    module_names = [package_name]
    module_names += [f'{package_name}.{name}' for _, name, _
                     in walk_packages(package.__path__)]
    return tuple(module_names)


@pytest.mark.parametrize("module", get_submodule_names('mygrad'))
@pytest.mark.parametrize(
    "extra", ["numba", "graphviz", "python-graphviz"])
def test_imports_do_not_require_extras(module: str, extra: str):
    with mock.patch.dict(sys.modules, {extra: None}):
        import_module(name=module)

Update `build_graph` docs

Update README with example of build_graph usage, and add brief example section to build_graph docstring

Reorganize unit tests, and create log of tested/not-tested functions

Organize into folders unit tests based on:

  • tests for Tensor
  • tests for base operations (including broadcasted)
  • tests for math functions
  • tests for neural network operations

(all folders should have the name test in them)

and create a log of all the untested functionality in the library - this will be turned into an issue so that we can have complete unit test coverage.

MyGrad 0.6: Refactoring back-propagation system

I'd like to discuss the back-prop refactor that MyGrad will undergo. I hope to arrive at a clean, simple, and efficient implementation design before beginning to restructure the code. I've already implemented a hacky branch with the relevant fundamental changes, and it works as a proof-of-concept.

The following reflects my current thoughts on the refactor and some barriers that I am anticipating. I'd love to get other perspectives on approaches for implementing breadth-first back-prop.

Motivation

MyGrad's current back-propagation system, although elegant in its simplicity, fails catastrophically - in terms of computational efficiency - for branching graphs such as:

image

Currently, we back-propagate depth-first. This means that y simply back-props each of its incoming gradients to x. As a result, dy/dx is computed twice. This current procedure is represented by the red equation.

A breadth-first approach, conveyed by the green equation, would entail first accumulating all of y's partial derivatives, constructing the total derivative of the terminal node (L) w.r.t y, and back-propagating to x a single time.

While the breadth-first approach is at most twice as fast for this simple example, back-propagation through a computational graph for a simple gated RNN is intractable without this method.

Originally, we had written MyGrad's back-prop system in such a way that students could easily trace through it or even implement it on their own. Moving forward, students can still be asked to construct their own back-prop systems in this way. An additional lesson will need to be provided to present the additional scaffolding needed to support breadth-first back-propagation.

Implementation Considerations

A breadth-first back-propagation system, by which a variables only back-propagated total derivatives, is substantially more cumbersome than is the current depth-first approach.

Each variable in the graph must know about all of its down-stream usages in the computational graph, distinguishing between those operations that do and do not contribute to the terminal node, which invoked the back-propagation; this is required in order for a variable to know when it has finished constructing the total derivative.

The following graph reveals a couple of the complexities that arise when accounting for these things:

image

Here, invoking back-prop from L requires y to accumulate its derivatives from:

  • its direct contribution to L
  • both of its contributions to z (e.g. z = y * y)

and then propagate this derivative to x. See that y should not wait for a derivative from w, as w is spurious to the value of L.

Implementation Ideas

  • backward needs to be refactored such that there is a public method, which distinguishes the terminal node for the computational graph. This will invoke _backward on all subsequent tensors, which will impose the constraint that a tensor will not back-prop until its received derivatives from all relevant down-stream operations. It would also be used to signal that a "new" back-propagation has begun; this is necessary in order for people to be able to do:
    loss.backward()
    loss.null_gradients()
    loss.backward()
    and get the same results. That is, invoking public backward should cause each variable to again
    expect all relevant downstream derivatives before itself back-propping.
  • Although a tensor already holds a list, _ops of the operation-instances that it is involved in, it may be preferable to work with hashable IDs for these operations, so that we can leverage set-comparisons for distinguishing spurious branches in the graph. That being said, we must also accommodate graphs like those above where a single tensor serves as multiple inputs to a single operation. Using a set would remove this information. I'm not sure what the right approach is for this.
  • Currently, null_gradients also clears the computational graph by clearing each tensor's _ops list. Now that we have to explicitly leverage this information, calling null_gradients prior to back-prop would be problematic. We may need to have a separate clear_graph function instead, so that people can use the popular workflow:
    loss.null_gradients()
    loss.backward()
    optim.step()

Write descriptive error when Operation.__call__ returns None

The following is a common stumbling block:

from mygrad.operations import Operation
from mygrad import Tensor

class Crap(Operation):
    def __call__(self, outputs, targets, weights=None):
        pass

    def backward_var(self, grad, index, **kwargs):
        pass

def crap(x, y, weights=None):
    return Tensor._op(Crap, x, op_args=(y, weights))
from crap import crap
from mygrad import Tensor
import numpy as np

num_samples = np.random.randint(100)+1
num_targets = np.random.randint(5)+1
weights = np.array(np.random.random(num_targets))
a = np.random.rand(num_samples, num_targets)
b = np.random.randint(0, num_targets, num_samples)
mygrad_a = Tensor(a)

crap(a, b)

# Output:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-4b64b5870bb5> in <module>()
----> 1 crap(a, b)

~/MyNN/mynn/losses/crap.py in crap(x, y, weights)
     10 
     11 def crap(x, y, weights=None):
---> 12     return Tensor._op(Crap, x, op_args=(y, weights))

~/MyGrad/mygrad/tensor_base.py in _op(cls, Op, op_args, op_kwargs, *input_vars)
    107             scalar_only = scalar_only or (var.scalar_only and not var.constant)
    108 
--> 109         return cls(op_out, constant=is_const, _creator=f, _scalar_only=scalar_only)
    110 
    111     def backward(self, grad=None, *, _broadcastable=False):

~/MyGrad/mygrad/tensor_base.py in __init__(self, x, constant, _scalar_only, _creator)
     40         else:
     41             self.data = np.asarray(x)
---> 42             self._check_valid_dtype(self.data.dtype)
     43 
     44         self.grad = None

~/MyGrad/mygrad/tensor_base.py in _check_valid_dtype(dtype)
     51     def _check_valid_dtype(dtype):
     52         if not np.issubdtype(dtype, np.number):
---> 53             raise TypeError("Tensor data must be a numeric type")
     54 
     55     @classmethod

TypeError: Tensor data must be a numeric type

test_subtract.py

    stt = s2
    all_s = [s0.data]
    ls2 = 0
    for n, x in enumerate(X2):
        stt = tanh(dense(x, U2) + dense(stt, W2))
        all_s.append(stt)
        o = dense(stt, V2)
        ls2 += o.sum()
    ls2.backward()

Missing math functions

The following math functions need implementations in mygrad.math. Please refer to the contents of that module for implementations of functions like mygrad.exp , mygrad.sqrt, mygrad.arctan for some guidance.

Tensor creation routines

The following no-op (i.e. no-backprop) tensor-creation functions should be implemented, mirroring the behavior of the corresponding numpy functions:

These should exist in: mygrad/tensor_creation/funcs.py

  • empty
  • empty_like
  • eye
  • identity
  • ones
  • ones_like
  • zeros
  • zeros_like
  • full
  • full_like
  • arange
  • linspace
  • logspace
  • geomspace

These should simply call the underlying numpy functions, without exposing order or subok arguments. There should also be a constant=False argument, which permits users to produce constant-valued tensors.

It might be a good idea to simply write a wrapper that converts these numpy functions to mygrad ones (returning tensors). The return statement in the docstring should be updated as such. We may want to use custom_inherit for updating the documentation in a manageable way. <- (maybe not...let's keep things simple)

MyGrad operations need a constant=False keyword arg

Currently, involving a non-constant Tensor in repeated operations without ever calling null_gradients results in an ever-growing list of ops. This is a problem when using a model for evaluation.

One solution to this, and a desirable option to have otherwise, is to let each function be called with a constant=False keyword argument, which controls whether or not the output is a constant.

Random sampling `mygrad.random`

These no-backprop functions should simply wrap the corresponding numpy functions, but return tensors. constant=False should be included as a default argument. Simply write a wrapper that converts these numpy functions to mygrad ones (returning tensors). The return statement in the docstring should be updated as such.

mygrad/random/funcs.py

  • rand
  • randn
  • randint
  • random_integers
  • random_sample
  • random
  • ranf
  • sample

Implement 'astype' function/method

We need to implement the method mygrad.Tensor.astype(), where the resulting tensor has the same creator as its parent but the specified datatype.

Neural Network Wrappers

This can be an ongoing discussion on what to include, but convenient wrappers for the neural network functionality would be really nice. Things like optimizers to handle the bookkeeping of all your variable updates, layers that register parameters, and so on. Think PyTorch but documented and clean (so not really PyTorch at all ๐Ÿ˜›). Feel free to add to the list below, or comment on anything. This can serve as a discussion and a board of ideas.

Not all of these are probably critical (for example, who really needs all those optimizers?), but I think they'd be nice to have to call this fully-fledged.

We should certainly discuss how adding these may change the design of MyGrad and keep in mind design decisions while implementing these.

Optimizers

These should take the parameters of a model and perform optimization over those parameters with respect to some loss. Learning rate schedulers may be included under here.

  • (Batch) SGD [with momentum, maybe with Nesterov]
  • Adam
  • Adadelta
  • Adagrad
  • (L-)BFGS
  • rmsprop

Convenience Layers

My thought here is to have classes that handle all of the parameter registration necessary for each layer in a network. For example, a Conv2D layer may take K, R, C and create the necessary weight Tensor, then register its parameters with an optimizer for easy updates.

  • ConvNd (N โˆˆ{1, 2, 3} probably at the least)
  • BatchNormNd
  • Dense layer
  • Any activations that need a dedicated layer (adaptive layers like PReLu)
  • Recurrent layers (plain RNN, LSTM, GRU)
  • Dropout?

More Losses

Should be self-explanatory

  • L{1,2}
  • Negative log-likelihood
  • KL divergence

Initializers

Very handy to be able to pass a Tensor to a function that will initialize it according to some method.

Backprop optimization: leveraging function symmetries

I wish I could label this with "good first issue", but this issue is a bit meatier than the label would suggest. That being said, this has the potential to be a really fun issue for an eager developer to take on. It:

  • will speeds some things up, which is about the most exciting thing you can do in mygrad ๐Ÿ˜œ
  • requires a simple, but potentially elegant revision of Operation
  • entails writing some slick automated tests

If there is anyone, or multiple people, who would like to participate in this, please let me know. I think it would be a great learning experience. I will happily provide guidance ranging from: "very hands on", to: "give general insights and review", depending on your needs/preferences.

Obviously if there is not any interest, I will end up taking this on myself in a few weeks or so. However, if you do want to take this on but at a later date, just let me know.

Math

f(x, y) is symmetric if f(x, y) = f(y, x). Thus the following is true for the derivatives for a symmetric function f:

image (1)

Suppose we want to compute the total derivative of a symmetric function with identical inputs. I.e.:

image (2)

given the relationship deduced above, this can be reduced to a single partial derivative.

image (3)

Obviously, this reduction extends trivially to symmetric functions of N inputs, where the factor of 2 becomes a factor of N.

Current State of MyGrad

Presently, MyGrad will always compute its derivatives in long-form (equation 2), even in the instance that it is dealing with a symmetric function that may receive identical inputs.

An exception to this is EinSum, which implements its own backprop so that common optimized sum-reduction cases like einsum("..., ...", x, x) don't drag during backprop; it implements the logic of equation 3 when it has a symmetric reduction case and identical inputs.

Proposal

Operation should have a symmetries attribute that allows individual operations identify symmetry relationships among its inputs. This would mean that those operations with symmetries would check for identical inputs (as enforced by is, not ==), and would compute the total derivative using the reduced form (equation 3) where possible.

The outcome of this is some nice, simple optimizations so that users can freely write things like logaddexp(x, x) without incurring redundant computations during backprop.

Can't set `constant`

I'd like to be able to set tensors to constant, but I cannot:

from mygrad import Tensor
weights = Tensor([1])
weights.constant = True

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-43-58d8e7751d9c> in <module>
----> 1 weights.constant = True

AttributeError: can't set attribute

A workaround is to set weights._constant; is this behavior intentional?

Add numba check

Numba isn't necessary for anything except the GRU (I believe), so I propose adding a check so that an exception isn't thrown if a user doesn't have numba and is just importing MyGrad. It's super tedious having to run the import cell twice since I haven't rebuilt numba for 3.7.

Add asserts for any functions that use arrays as indices

E.g. cross-entropy needs several sanity checks to ensure that people aren't accidentally passing in objects that can serve as indices (bool-arrays, broadcast-compatible-shaped arrays, etc). This can lead to extremely hard-to-find bugs, and lead to models that simply train poorly.

Implement mean-pooling neural network operation

Okay, so this might not exactly be a "good first issue" - it is a little more advanced, but is still very much accessible to newcomers.

Similar to the mygrad.nnet.max_pool function, I would like there to be a mean-pooling layer. That is, a convolution-style windows is strided over the input, and the mean is computed for each window. E.g. the following is shows how mean-pooling should work on a shape-(3, 3) tensor, using a shape-(2, 2) pooling window strided with a step-size of 1 (both along the rows and the columns.

>>> import  mygrad as mg
>>> x = mg.Tensor([[0., 1.,  2.],
...                [3., 4.,  5.],
...                [6., 7., 8.]])

# Forward Pass
>>> out = mean_pool(x, pool=(2, 2), stride=1)
>>> out
Tensor([[2., 3.],
        [5., 6.]])

# Backprop
>>> out.sum().backward()  # must backprop from a scalar, thus we sum `out`
>>> x.grad
array([[0.25, 0.5 , 0.25],
       [0.5 , 1.  , 0.5 ],
       [0.25, 0.5 , 0.25]])

Like max_pool, this function should accommodate N-dimensional tensors. mygrad.sliding_window_view makes short work of this. This function basically boils down to taking the appropriate sliding-window view of the underlying numpy array of the input tensor, and using numpy.mean to take the average over the trailing N dimensions that you want to pool over. This is much easier than doing max-pooling, since numpy.mean is able to accept multiple axes .

Try starting with the forward pass for the 1D and 2D cases only. I can help you generalize to N-dimensions if you get stuck. I am also happy to help derive the proper back-propagation for this.

Add mirrors to numpy objects like np.newaxis and np.float

Create exact aliases for common numpy objects. E.g. mygrad.newaxis (this is done already) as a drop-in replacement for np.newaxis, mygrad.float32 replaces numpy.float32 etc. This is meant to allow people to search and replace numpy with mygrad and have their code still work.

TODO: make a list of all these objects

logic routines

The following no-backprop numpy functions should be wrapped so that they can accept tensors or numpy arrays (or both). These should all still return numpy arrays.

mygrad.logic

  • logical_and (via numpy override)
  • logical_or (via numpy override)
  • logical_not (via numpy override)
  • logical_xor (via numpy override)
  • allclose (via numpy override)
  • isclose
  • greater (via numpy override)
  • greater_equal (via numpy override)
  • less (via numpy override)
  • less_equal (via numpy override)
  • equal (via numpy override)
  • not_equal

A wrapper might be useful here so that any tensors passed to the functions simply have their underlying numpy-arrays passed instead.

We ought not expose sophisticated arguments, like out, casting, order, signature, subok, or extobj

Implement multi-matmul

Similar to numpy.linalg.multi_dot, we need an implementation for multi-matmul (mygrad only implements matmul and not dot for the time being, but this has no real bearing on this function). The point of this function is to order a chain of matrix multiplications to minimize the total cost of the computation. This pays off doubly when we do backprop.

The source reveals a simple algorithm under the hood for determining the optimal chain of matrix. They simply call numpy.dot in that order. For us, once the appropriate multiplication order is found, we can simply call mygrad.matmul in that order and not worry about implementing the back-prop ๐Ÿ˜ .

We need to take some care with making the tensors be at least 2D.

We should be able to test our result against:

import functools
import mygrad as mg
def multi_matmul_slow(arrays): return functools.reduce(mg.matmul, arrays)

Reshape *axes

The numpy ndarray object has a method that takes in *axes rather than axes; MyGrad should mirror this.

Complete `mygrad.math`

Create tensor ops and mygrad.math functions for the following:

  • Trigonometric functions (cos, cosh, arccos, cot, etc)
  • math.log10, math.log2
  • math.abs (with piece-wise derivative)
  • math.sqrt & math.cbrt (cube-root)
  • math.logaddexp see this numpy function

Provide basic unit tests that provide sanity checks that the forward and backward prop works.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.