tinygrad / tinygrad Goto Github PK

You like pytorch? You like micrograd? You love tinygrad! ❤️

License: MIT License

Python 79.88% Shell 0.22% Swift 0.02% C++ 0.73% Objective-C++ 0.26% Objective-C 0.03% C 18.79% Dockerfile 0.01% Assembly 0.04% JavaScript 0.03%

tinygrad's Introduction

tinygrad: For something between PyTorch and karpathy/micrograd. Maintained by tiny corp.

Homepage | Documentation | Examples | Showcase | Discord

This may not be the best deep learning framework, but it is a deep learning framework.

Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.

tinygrad is still alpha software, but we raised some money to make it good. Someday, we will tape out chips.

Features

LLaMA and Stable Diffusion

tinygrad can run LLaMA and Stable Diffusion!

Laziness

Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.

DEBUG=3 python3 -c "from tinygrad import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"

And we can change DEBUG to 4 to see the generated code.

Neural networks

As it turns out, 90% of what you need for neural networks are a decent autograd/tensor library. Throw in an optimizer, a data loader, and some compute, and you have all you need.

from tinygrad import Tensor, nn

class LinearNet:
  def __init__(self):
    self.l1 = Tensor.kaiming_uniform(784, 128)
    self.l2 = Tensor.kaiming_uniform(128, 10)
  def __call__(self, x:Tensor) -> Tensor:
    return x.flatten(1).dot(self.l1).relu().dot(self.l2)

model = LinearNet()
optim = nn.optim.Adam([model.l1, model.l2], lr=0.001)

x, y = Tensor.rand(4, 1, 28, 28), Tensor([2,4,3,7])  # replace with real mnist dataloader

for i in range(10):
  optim.zero_grad()
  loss = model(x).sparse_categorical_crossentropy(y).backward()
  optim.step()
  print(i, loss.item())

See examples/beautiful_mnist.py for the full version that gets 98% in ~5 seconds

Accelerators

tinygrad already supports numerous accelerators, including:

And it is easy to add more! Your accelerator of choice only needs to support a total of ~25 low level ops.

Installation

The current recommended way to install tinygrad is from source.

From source

git clone https://github.com/tinygrad/tinygrad.git
cd tinygrad
python3 -m pip install -e .

Direct (master)

python3 -m pip install git+https://github.com/tinygrad/tinygrad.git

Documentation

Documentation along with a quick start guide can be found in the docs/ directory.

Quick example comparing to PyTorch

from tinygrad import Tensor

x = Tensor.eye(3, requires_grad=True)
y = Tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()

print(x.grad.numpy())  # dz/dx
print(y.grad.numpy())  # dz/dy

The same thing but in PyTorch:

import torch

x = torch.eye(3, requires_grad=True)
y = torch.tensor([[2.0,0,-2.0]], requires_grad=True)
z = y.matmul(x).sum()
z.backward()

print(x.grad.numpy())  # dz/dx
print(y.grad.numpy())  # dz/dy

Contributing

There has been a lot of interest in tinygrad lately. Following these guidelines will help your PR get accepted.

We'll start with what will get your PR closed with a pointer to this section:

No code golf! While low line count is a guiding light of this project, anything that remotely looks like code golf will be closed. The true goal is reducing complexity and increasing readability, and deleting \ns does nothing to help with that.
All docs and whitespace changes will be closed unless you are a well-known contributor. The people writing the docs should be those who know the codebase the absolute best. People who have not demonstrated that shouldn't be messing with docs. Whitespace changes are both useless and carry a risk of introducing bugs.
Anything you claim is a "speedup" must be benchmarked. In general, the goal is simplicity, so even if your PR makes things marginally faster, you have to consider the tradeoff with maintainablity and readablity.
In general, the code outside the core tinygrad/ folder is not well tested, so unless the current code there is broken, you shouldn't be changing it.
If your PR looks "complex", is a big diff, or adds lots of lines, it won't be reviewed or merged. Consider breaking it up into smaller PRs that are individually clear wins. A common pattern I see is prerequisite refactors before adding new functionality. If you can (cleanly) refactor to the point that the feature is a 3 line change, this is great, and something easy for us to review.

Now, what we want:

Bug fixes (with a regression test) are great! This library isn't 1.0 yet, so if you stumble upon a bug, fix it, write a test, and submit a PR, this is valuable work.
Solving bounties! tinygrad offers cash bounties for certain improvements to the library. All new code should be high quality and well tested.
Features. However, if you are adding a feature, consider the line tradeoff. If it's 3 lines, there's less of a bar of usefulness it has to meet over something that's 30 or 300 lines. All features must have regression tests. In general with no other constraints, your feature's API should match torch or numpy.
Refactors that are clear wins. In general, if your refactor isn't a clear win it will be closed. But some refactors are amazing! Think about readability in a deep core sense. A whitespace change or moving a few functions around is useless, but if you realize that two 100 line functions can actually use the same 110 line function with arguments while also improving readability, this is a big win.
Tests/fuzzers. If you can add tests that are non brittle, they are welcome. We have some fuzzers in here too, and there's a plethora of bugs that can be found with them and by improving them. Finding bugs, even writing broken tests (that should pass) with @unittest.expectedFailure is great. This is how we make progress.
Dead code removal from core tinygrad/ folder. We don't care about the code in extra, but removing dead code from the core library is great. Less for new people to read and be confused by.

Running tests

You should install the pre-commit hooks with pre-commit install. This will run the linter, mypy, and a subset of the tests on every commit.

For more examples on how to run the full test suite please refer to the CI workflow.

Some examples of running tests locally:

python3 -m pip install -e '.[testing]'  # install extra deps for testing
python3 test/test_ops.py                # just the ops tests
python3 -m pytest test/                 # whole test suite

tinygrad's People

Contributors

Stargazers

Watchers

Forkers

dewpey propels murilopetruci adriangb t-groth tisu19021997 orena1 themrghostman 0xnan bccw-ai emersonmax139-zz shenyunlong davidrsewell tsmcalister localhost1025 goktug97 marcelbischoff mulac zmarabeas afshin1354 johndoeus wlawt stjordanis liej6799 metalronin sr1jan omar-bb gianargamosa k105la calledit psyashes k0enm andersspringborg salazar-99 hishaya arseniysky pk-development mlbo ryanneph gallanoe rdelg droter leonardssh maderix mpizzzle solbiatialessandro smurfd dragonbg ziofil ramstein aureliancnx asufhauw niclaswue dustcollector12 tomseeberger tzadouri charismaticzone schultzer kartik4949 lepy theexitstrategy jondeuce pb1729 zabirauf robbiemurray shinroo navid-fn jayaudaykmar26589 ashwath007 jlz1783 epizzigoni big-c-note f0ti cdave1 charlsefrancis worthless443 gvc0461082002 adeeb10abbas 7teenzxc draganjovanovich geekwish zeta1999 arjundeol168 mdmarek h21k sailfish009 awesome-archive viraatdas apepkuss herolin12 husterjwx linshu1994 googol-lab yokiqust pyun-ram shuchaodong simonlingyun zhangjiwei-japan evanxephon kexiwjs

tinygrad's Issues

Add Ensemble wrapper i.e TTA, other Ensembling techniques.

Im thinking to add an ensembling wrapper for the ensembling of models .
i.e

from tinygrad import ensemble
# example 1
ensembled_model_object = ensemble(models=[efficientnet_model_object], type='tta', aug=['original','fliplr'] )
out = ensembled_model_object.forward(image)
# tta improves results by 2-3 MaP units.


# example 2
ensembled_model_object = ensemble(models=[efficientnet_model_object, other_model_obj], type='parallel_ensemble', aug=None )
# two models will be running in parallel processes may be using ray, and save time but this might have memory  constraints
# so if used type='ensemble', internally will build sequential graph.
out = ensembled_model_object.forward(image)

@geohot thoughts?

README should shill harder for our ImageNet support

With pictures!

move utils.py into extra, as its used outside tensor module

GPU=1 examples/train_efficientnet.py doesn't work

Fix broadcasting. Munch 1's from left for mod, munch 1's from right for div.

TestMNIST.test_conv_gpu doesn't work

Write convolution backward pass!

Move unbroadcasting out to tensor.py

Currently it's duplicated in ops_cpu and ops_gpu. It should be implementable as a second class op.

cannot import name 'fetch'' from tinygrads.utils'

When installing via pip, tinigrad.utils.fetch is different from the last commit.
fp = os.path.join("/tmp", hashlib.md5(url.encode('utf-8')).hexdigest())
s miising from pip version

Backward Error Running on Windows Anaconda Enviroment

torch forward pass: 20.993 ms
torch backward pass: 210.071 ms

                                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls

                          aten::addmm_        19.93%      51.973ms        19.93%      51.973ms      30.217us          1720
              aten::threshold_backward        11.94%      31.134ms        12.04%      31.387ms       3.139ms            10
                       aten::threshold        11.57%      30.184ms        11.65%      30.384ms       3.038ms            10
            aten::thnn_conv2d_backward         9.75%      25.432ms        40.07%     104.499ms      10.450ms            10
                           aten::fill_         9.45%      24.652ms         9.45%      24.652ms      50.310us           490
         aten::max_pool2d_with_indices         8.99%      23.435ms         9.10%      23.720ms       2.372ms            10
                          aten::select         7.79%      20.309ms         9.17%      23.919ms       6.165us          3880
             aten::thnn_conv2d_forward         5.39%      14.059ms        14.23%      37.100ms       3.710ms            10
aten::max_pool2d_with_indices_backward         2.61%       6.814ms         6.66%      17.368ms       1.737ms            10
                              aten::mm         2.47%       6.444ms         2.51%       6.540ms     435.980us            15

Self CPU time total: 260.772ms

E

ERROR: test_mnist (main.TestConvSpeed)

Traceback (most recent call last):
File "c:\Users\Nehad Hirmiz\Documents\Programming\Python\Tutorials\tinygrad\test_speedynet.py", line 83, in test_mnist
out.backward()
File "c:\ProgramData\Anaconda3\envs\deeptorch\lib\site-packages\tinygrad\tensor.py", line 68, in backward
t.backward(False)
File "c:\ProgramData\Anaconda3\envs\deeptorch\lib\site-packages\tinygrad\tensor.py", line 68, in backward
t.backward(False)
File "c:\ProgramData\Anaconda3\envs\deeptorch\lib\site-packages\tinygrad\tensor.py", line 68, in backward
t.backward(False)
[Previous line repeated 1 more time]
File "c:\ProgramData\Anaconda3\envs\deeptorch\lib\site-packages\tinygrad\tensor.py", line 63, in backward
if g.shape != t.data.shape:
AttributeError: 'tuple' object has no attribute 'shape'

add visualization of computational graph

@geohot thoughts?

Run tests in CI

Run all the tests we can that don't depend on torch if no torch

On M1 MacBook I can't install pytorch, hence I can't run the tests.

Some of the tests just don't need it, and we should also have tests to compare the CPU and GPU ops. Self testing without an external reference++

Unsolicited Logo Suggestion

The points making up the fish represent samples of a gradient as in SGD

can not access tinygrad from examples/

In examples, objects in all the files under tinygrad are imported. However, running python3 examples/efficientnet.py is not able to access tinygrad and I'm thrown an ImportError

EfficientNet doesn't work!

It should be a cat. It is a washer.

Who can fix. I will be impressed.

simple GAN to generate mnist data :).

I have some progress in simple GAN with tinygrad.
fixin' batchnorm in personal fork.
@geohot suggestions

Error while trying to run examples/efficientnet.py

Traceback (most recent call last):
File "examples/efficientnet.py", line 14, in
from extra.efficientnet import EfficientNet
ModuleNotFoundError: No module named 'extra.efficientnet'

How do I get module 'extra.efficientnet'??

Run both unit tests in CI

GPU EfficientNet is weirdly slow

did inference in 0.28 s
                 Mul : 163       29.18 ms
                 Add : 140       25.53 ms
                 Pow :  98       18.43 ms
               Pad2D :  17       16.97 ms
              Conv2D :  81       14.49 ms
             Sigmoid :  65       10.23 ms
             Reshape : 230        9.94 ms
                 Sub :  49        9.75 ms
           AvgPool2D :  17        5.93 ms
                 Dot :   1        1.06 ms

Run with DEBUG=1 for profiling. Conv2D isn't even close to the top in time users.

why not directly expose elements under the modules of tinygrad in init.py

in tinygrad/__init__.py, the modules are exposed like :

import tinygrad.nn
import tinygrad.xxx

Why not expose the modules as by their name directly, for example

from tinygrad import nn

from tinygrad.nn as nn

Python version support?

@geohot What versions of python do you want to support?

If you can specify this, I will add it to automation and documentation.

GPU=1 ipython3 examples/train_efficientnet.py runs out of memory

I don't think anything is freed ever. We have to fix this if we want to train.

EfficientNet runs slower on GPU than CPU

Running EfficientNet in examples/efficientnet.py runs slower on the GPU than CPU for some reason. Benchmarks:

PYTHONPATH=. GPU=1 python3.8 examples/efficientnet.py https://image.shutterstock.com/image-illustration/compact-white-car-3d-render-260nw-405716083.jpg
Output (GPU):

656 7.561172 minivan
did inference in 1.13 s

PYTHONPATH=. python3.8 examples/efficientnet.py https://image.shutterstock.com/image-illustration/compact-white-car-3d-render-260nw-405716083.jpg
Output (CPU):

656 7.5611706 minivan
did inference in 0.71 s

What could be causing this? Im runninng this with Python 3.8 on a MacBook Pro 2018 with Intel Iris Plus Graphics 1536 MB running macOS Catalina.

Batch norm backward pass isn't right

Fix!

Is it better than pytorch

Can we beat...

... or at least replicate https://github.com/Matuzas77/MNIST-0.17 with tinygrad? I think we have all the ops (besides augmentation).

any todo on Separable Convs after Conv2D is done?

@geohot we can get that speeeed!

Stride is wrong here

https://github.com/geohot/tinygrad/blob/c40862241dd8ab3f5c21c3a945bd283247f9d22f/examples/efficientnet.py#L69

RMSProp and Adam use .data

They need to operate on Tensors.

Is this ever called without allow_fill?

The toposort is all local, clean this up?

https://github.com/geohot/tinygrad/blob/0cf21881b710dd55614a1eb19c5ca0e7081cb8b5/tinygrad/tensor.py#L139

TinyGrad core for respecting 1K loc limit

How about separating the core (tensor, ops, opsgpu, nn, utils etc.) logic into tiny-grad core project and creating another repo for extensions (models, examples, notebooks, etc.) IMO core code should be high quality(lol) but complete.

We have to lose 4 lines!

Quality refactors only please.

Can't import fetch from ultis

ImportError: cannot import name 'fetch' from 'tinygrad.utils' (/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tinygrad/utils.py)
This occur when i'm running the example.
I'm not a py developer then

Broadcasting on GPU doesn't work for BS > 1

See examples/train_efficientnet.py

tinygrad has grown too large!

The gpuops are a real mess, build the CL more cleverly.

PyOpenCL doesn't work in GitHub actions

Fix so we can test GPU ops

lr tensor shape mismatch when GPU is on

# %%
from tinygrad.tensor import Tensor
from tinygrad.utils import layer_init_uniform
from tinygrad.optim import SGD

from pprint import pprint


class MLP:
    def __init__(
        self,
        input_size,
        network_size,
        gpu=True,
        learning_rate=0.001,
    ):
        self.gpu = gpu
        self.input_size = input_size
        self.network_size = network_size

        layer_sizes = zip([input_size, *network_size[:-1]], network_size)

        self.layers = [
            Tensor(layer_init_uniform(in_size, out_size), gpu=self.gpu)
            for (in_size, out_size) in layer_sizes
        ]
        self.optimizer = SGD(self.layers, lr=learning_rate)

    def __call__(self, x):
        output = Tensor(x, gpu=self.gpu)
        for i, layer in enumerate(self.layers):
            output = output.dot(layer)
            if i != len(self.layers) - 1:
                output = output.relu()
        return output

    def learn(self, x, y):

        _y = Tensor(y, gpu=self.gpu)
        two = Tensor([[2]], gpu=self.gpu)

        output = self.__call__(x)
        loss = (output - _y).pow(two).mean()

        loss.backward()
        # self.optimizer.step()


mlp = MLP(3, [2, 1], gpu=True)

x = [[1.0, 1.0, 1.0]]
y = [[1.0]]


mlp.learn(x, y)
# pprint([layer for layer in mlp.layers])

for layer in mlp.optimizer.params:
    print(layer.grad)
    print((layer.grad * Tensor([[0.01]], gpu=True)).cpu())
    print((layer.grad * mlp.optimizer.lr).cpu())

print((layer.grad * Tensor([[0.01]], gpu=True)).cpu())

print out the grad right to me

Tensor array([[ 0.        , -0.00866722],
       [ 0.        , -0.00866722],
       [ 0.        , -0.00866722]], dtype=float32) with grad None

while

print((layer.grad * mlp.optimizer.lr).cpu())

reports

shape mismatch in binop a*b: (3, 2) (1,)

if GPU is off, there's no this issue.

Not sure if there's something I've missed here?

Many GPU ops are missing for backward pass

Please write

tinygrad is growing! should be tiny! add CI test for < 1000 lines

I deleted all the fastconv crap because it wasn't tiny. Output from sloccount:

SLOC    Directory       SLOC-by-Language (Sorted)
325     tinygrad        python=325
260     test            python=260


Totals grouped by language (dominant language first):
python:         585 (100.00%)

Can someone add sloccount to CI and have it fail if it ever gets above 1000?

Also, no code golf, but refactors that reduce complexity are very welcome.

EOFError: Ran out of input

When running example and solving "Can't import fetch from utils" issue, this one comes up:

consider using numba

what about using numba for further optimizations ?

[Not a bug] Warning for Python 3.9.0

As mentioned in setup.py tinygrad requires > python 3.8
Pytorch installation fails on python3.9.0. So, tinygrad examples will not work for Python 3.9.0

See: pytorch/pytorch#47354 for more details.

I am using Python 3.8.6 and it is working fine but was not able to install requirements.txt due to torch and torchvision on Python 3.9.0 most possibly because in PyPI there are no wheels, ready to install binaries, for Python 3.9 as its still quite new.

backward pass in pow seems to have issues...

and should not be computed if requires_grad = False.

tinygrad/ops_cpu.py:58: RuntimeWarning: invalid value encountered in log
  unbroadcast((x**y) * np.log(x) * grad_output, y.shape)

Currently at 989 lines. Reduce lines while improving readability!

Remember, this isn't code golf. The goal is to make the code easier to maintain. In general that's also smaller, but smaller that's harder to maintain can't be merged.

b0c0c5d0d62e40df97999c8df3c1809ecb7842db breaks enet inference on CPU!

@ryanneph it just hangs on CPU

Support multiplying by numbers

Create the unit tensor right before the forward pass

https://github.com/geohot/tinygrad/blob/0cf21881b710dd55614a1eb19c5ca0e7081cb8b5/tinygrad/tensor.py#L215

Group Conv is slow

Who can einsum?

No module named 'tinygrad' when running test from terminal.

Running the command:

$ python3.8 test/test_mnist.py TestMNIST.test_sgd_gpu

Outputs following error:

Traceback (most recent call last):
  File "test/test_mnist.py", line 5, in <module>
    from tinygrad.tensor import Tensor, GPU
ModuleNotFoundError: No module named 'tinygrad'

This is the result of importing in the following manner

from tinygrad.tensor import Tensor, GPU                                         
from tinygrad.utils import layer_init_uniform, fetch

Shouldn't we use relative path imports here instead? Shouldn't the code be able to run without installing with:

pip3 install git+https://github.com/geohot/tinygrad.git --upgrade

Design custom FOSSi hardware, get it printed for free

The project is at nearly 1000 stars so it's clearly time to start thinking about custom hardware.
Could be designed in the open and then manufactured for free.

@geohot I think it makes sense to tag this as a good first issue.