Code Monkey home page Code Monkey logo

micrograd's Introduction

I like deep neural nets.

micrograd's People

Contributors

bpesquet avatar karpathy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

micrograd's Issues

Ensure backward() is idempotent

Hi Andrej,

Many thanks for micrograd & its accompanying video; they deepened my understanding of backprop considerably!

I notice that in the current implementation, calling backward() repeatedly is non-idempotent, because the grads just keep accumulating. This seems like something people are likely to trip over. The fix is simple: in the def of backward(), just above

        # go one variable at a time and apply the chain rule to get its gradient
        self.grad = 1

add

        # reset gradients to ensure they don't get repeatedly accumulated
        for v in reversed(topo):
            v.grad = 0

Just submitted PR 54 for your consideration which just makes that one change.

Example of non-idempotence with current master branch: given a simple tree where a = 3, b = 2, c = a + b, d = 1, e = c * d (all leaves as Values of course):

>>> print_grads()
print_grads()
a: 0, b: 0, c: 0, d: 0, e: 0
>>> e.backward()
e.backward()
>>> print_grads()
print_grads()
a: 1.0, b: 1.0, c: 1.0, d: 5.0, e: 1
>>> 
>>> e.backward()
e.backward()
>>> print_grads()
print_grads()
a: 3.0, b: 3.0, c: 2.0, d: 10.0, e: 1
>>> 

`other` should have a gradient in `__pow__` (?)

Hey Andrej -- just want to say thanks so much for your YouTube video on micrograd. The video has been absolutely enlightening.

Quick question -- while re-implementing micrograd on my end, I noticed that __pow__ (in Value) was missing a back-propagation definition for other. Is this expected?

def _backward():
self.grad += (other * self.data**(other-1)) * out.grad

Incorrect gradient when non-leaf Values are re-used

Thank you @evcu for raising, my little 2D toy problem converged and instead of going on to proper tests and double checking through the recursion I got all trigger-happy and amused with puppies. The core issue is that if variables are re-used then their gradient will be accumulated for each path. Do you think this simpler reference counting idea will work as a potential simpler solution? The idea is to suppress backward() calls until the very last one.

(Love your Stylized puppy in your branch btw! :D)

class Value:
    """ stores a single scalar value and its gradient """

    def __init__(self, data):
        self.data = data
        self.grad = 0
        self.backward = lambda: None
        self.refs = 0

    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data)
        self.refs += 1
        other.refs += 1
        
        def backward():
            if out.refs > 1:
                out.refs -= 1
                return
            self.grad += out.grad
            other.grad += out.grad
            self.backward()
            other.backward()
        out.backward = backward

        return out

    def __radd__(self, other):
        return self.__add__(other)

    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data)
        self.refs += 1
        other.refs += 1
        
        def backward():
            if out.refs > 1:
                out.refs -= 1
                return
            self.grad += other.data * out.grad
            other.grad += self.data * out.grad
            self.backward()
            other.backward()
        out.backward = backward

        return out

    def __rmul__(self, other):
        return self.__mul__(other)

    def relu(self):
        out = Value(0 if self.data < 0 else self.data)
        self.refs += 1
        def backward():
            if out.refs > 1:
                out.refs -= 1
                return
            self.grad += (out.data > 0) * out.grad
            self.backward()
        out.backward = backward
        return out

    def __repr__(self):
        return f"Value(data={self.data}, grad={self.grad})"

Homework Assignment Error with softmax activation function

Hi @karpathy
I was solving the assignment as mentioned in the YouTube video. In the Softmax function, I was getting the following error TypeError: unsupported operand type(s) for +: 'int' and 'Value'

This is the line where I am getting the error

def softmax(logits):
  counts = [logit.exp() for logit in logits]
  denominator = sum(counts) #Here I am getting the Typeerror
  out = [c / denominator for c in counts]
  return out

And, my add function in Value Class is the following

def __add__(self, other): # exactly as in the video
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data + other.data, (self, other), '+')
    
    def _backward():
      self.grad += 1.0 * out.grad
      other.grad += 1.0 * out.grad
    out._backward = _backward
    
    return out

So my query is on the sum of list function. It is probably similar to counts[i].add(counts[i+1]) and then we keep on adding to the result till the end of the list. So this add function should work well. But I am not sure why it is not working, am I missing something?
Thanks in advance

Regarding the gradient update of the __sub__ operation

image

The sub operation implemented here would utilize the _backward method of the add operation. I believe this is wrong because the _backward method for add operation accumulates out.grad for both the operands, but in case of sub operation it should accumulate out.grad for the positive operand and -out.grad for the negative operand.

image

for example:
a = b + c
d(a)/db = 1
d(a)/dc = 1

a = b - c
d(a)/db = 1
d(a)/dc = -1

So i think we need to add a separate _backward function for sub operation or modify the _backward method for add operation.

Adjusting parameters by sign and magnitude of gradient

https://github.com/karpathy/micrograd/blame/c911406e5ace8742e5841a7e0df113ecb5d54685/demo.ipynb#L271C13-L271C45

I really appreciate your videos! Such a gift to all of us.

When adjusting parameters after computing the loss, the example multiplies the step size by the sign and magnitude of the gradient. In cases of a steep gradients near local minimum values, a large value will jump the parameter far from the desired solution. In the case of shallow gradients, the parameter will struggle to reach its local minimum in the given number of iterations.

Thus, I think the adjustment should be a step size times the sign of the gradient.

What are your thoughts?

Sequential MLP implementation

Maybe not PR worthy, but I guess one can abstract the MLP implementation even more, making use of the layers instead of number of inputs and outputs yet again, since each individual layer already knows them.

As such, I wrote it as:

class MLP:

  def __init__(self, layers):
    self.layers = layers

  def __call__(self, x):
    for l in self.layers:
      x = l(x)
    return x

  def parameters(self):
      return [p for layer in self.layers for p in layer.parameters()]

by which you can define a network more intuitively, much like PyTorch's Sequential:

n = MLP([Layer(3, 6), Layer(6, 3), Layer(3, 1)])

To be even more rigorous, a dimension assertion can be added in the __init__:

class MLP:

  def __init__(self, layers):
    self.layers = layers
    for i in range(1, len(layers)):
      assert layers[i-1].nout == layers[i].nin

for which I would have to store the nin & nout for the layers in the as well:

class Layer:

  def __init__(self, nin, nout):
    self.nin = nin
    self.nout = nout
    self.neurons = [Neuron(nin) for _ in range(nout)]

Noob question about backprop implementation

Hello,
I came across this from your YT video tutorials, thank you for making these!

In engine.py, you implement back propagation using explicit topological order computation.
Are there any reasons why we would not recursively call _backward for every child ?
e.g. implement backward function in Value as such:

    def backward(self):
        self._backward()
        for v in self._prev:
            v.backward()

Does it have something to do with how backprop is implemented in actual NN libraries? Is recursion harder to parallelise in practice compared to using topological ordering?

Thank you

Issue with zero_grad?

Hi, unless I'm misunderstanding something, zero_grad in nn.py is zeroing out the gradients on the parameter nodes, but shouldn't it do it on all the nodes in the graph?
Otherwise the inner nodes will keep accumulating them.

Rename engine.py to value.py

I suggest you rename engine.py to value.py.

Reasoning:

  • the engine name is misleading. The file doesn't contain some framework or domain logic.
  • the engine.py file contains a solo class named Value. The best name for it is value.py.

PyPI package

Feature

  • Convert Micrograd into a PyPI package.

Need

  • Would be easier for institutions or bootcamps to help adopt this for their students.
  • As an organizer of the Data Science Club at SJSU, I would love to introduce the fundamentals with this library.

Reseting the grad of weights and biases is not enough

In the video "The spelled-out intro to neural networks and backpropagation: building micrograd" you present the following code:

n = MLP(3, [4, 4, 1])
xs = [
  [2.0, 3.0, -1.0],
  [3.0, -1.0, 0.5],
  [0.5, 1.0, 1.0],
  [1.0, 1.0, -1.0],
]
ys = [1.0, -1.0, -1.0, 1.0] # desired targets
for k in range(20):
  
  # forward pass
  ypred = [n(x) for x in xs]
  loss = sum((yout - ygt)**2 for ygt, yout in zip(ys, ypred))
  
  # backward pass
  for p in n.parameters():
    p.grad = 0.0
  loss.backward()
  
  # update
  for p in n.parameters():
    p.data += -0.1 * p.grad
  
  print(k, loss.data) 

However before calling loss.backward() we should reset the grad for ALL values, not just for n.parameters().
Because every iteration of loss.backward() changes the grad (+=...) for all.

backward member implementation question

Why can't this function simply be implemented as follows? Am I missing something? We are dealing with a composite structure.

  def backward(self, is_first=True):
    if (is_first == True):
      self.grad = 1.0

    self._backward()

    for c in self._prev:
      c.backward(False)

Topological sort - bug

It's a nit that won't matter most of the time but the topo sort implementation doesn't work in case you have cycles in the graph.

i.e. there is a hard assumption you're operating over a DAG.

_backward as lambdas?

Hi @karpathy,

congratulations on this repo/talk. The educational value is truly immense. Good job!

Can you please explain the main motivation for _backward methods implemented as lambdas, as opposed to (one) regular method that starts with a hypothetical switch (self._op) and contains implementation for all arithmetic cases?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.