borgwang / tinynn Goto Github PK

View Code? Open in Web Editor NEW

364.0 9.0 92.0 460 KB

A lightweight deep learning library

License: MIT License

Python 100.00%

deep-learning-framework deep-learning-library project-based-learning

tinynn's Introduction

tinynn

tinynn is a lightweight deep learning framework written in Python3 (for learning purposes).

Getting Started

Install

pip3 install tinynn

Run examples

git clone https://github.com/borgwang/tinynn.git
cd tinynn/examples

# MNIST classification
python3 mnist/run.py

# a toy regression task
python3 nn_paint/run.py

# reinforcement learning demo (gym environment required)
python3 rl/run.py

Intuitive APIs

# define a model
net = Net([Dense(50), ReLU(), Dense(100), ReLU(), Dense(10)])
model = Model(net=net, loss=MSE(), optimizer=Adam(lr))

# train
for batch in iterator(train_x, train_y):
    preds = model.forward(batch.inputs)
    loss, grads = model.backward(preds, batch.targets)
    model.apply_grads(grads)

Contribute

Please follow the Google Python Style Guide for Python coding style.

License

MIT

tinynn's People

Contributors

Stargazers

Watchers

Forkers

w32zhong ybin fengjunxi embedxj aron-castle guhaibin1847 zhenghuadai waterzxj perfmjs biubug6 trendingtechnology yuqall yangyang233333 bityangke privateos baolanchen meet-ai smartyoung weiyu322 dream-runner-yu xiaosongshine lunight adas-eye yangjunting100 ilyaukin s-huabomb dev233 zhuxb sunjiadong dengpanyin sjzuliguoku perfcv defaultrobot pgcai hahahaha-hash luke202001 don-tpanic jacklee1 xrosliang bblwg2020 ltstar zhoujie0223 stvsd1314 temptemplee yyxx1997 tkdo aidway fatfatbear qiaogh97 cousnecs linjianli justin-cold xiaoyu1004 jnulzl wang-lucy5 tesla-1i caofx0418 yonglianglan allegrofb huanghaoyu1997 heshuhao1106 chenhuan19871014 hupsan3 hukai97 contropist chrisliu12345 yijingyidong lixiang007666 soaringxu wangxiaofeng-python liuwenhaha p1ggg trekmax chenqy2018 jalork woimpc yongchaohuang ningchungui honglinchu dicksonnetworking wwww6662003 codingonion secmlberry powermano xrtbuaa bdjimmy caiyingchun tangzixiong citymap dingyiwen leadjc1993 jtyantai

tinynn's Issues

可以添加transformer ，seq2seq 和 attention 这些吗？

是从知乎上看到的，非常喜欢也这个工程，觉得这个工程对初学者太友好了，比直接使用大深度学习框架，能了解更多的细节。

看了一些transformer 的资料，还是不太理解它是怎么实现的，在网上找的transformer 实现都是基于pytorch 和 tf实现的，没有像这样从头开始实现的。

I think it is a very outdated practice to add information such as author, date and filename into file header. These are tracked by git already (in much better way), and they only add maintenance cost. Just a suggestion, for later new files to be created.

I come across this suggestion because it makes me hesitate whether I should modify your header when I change a minor thing in the code. And it will also cause confusion for potential contributors of this project (if you want more people to participate).

If you agree, I can help to remove those fields in current source code.

激活函数问题

大佬您好，在看到激活函数的时候有一点不清楚，为什么forward和backward要再增加两个函数fun和derivative_func？

The MNIST example has surprising test accuracy.

Why only one layer of dense net without any activation will have over 80% test accuracy after the first epoch?

I simply comment other layers in dense net, and use dense net to train.

        net = Net([
            # Dense(200),
            # ReLU(),
            Dense(10),
            # ReLU(),
            # Dense(70),
            # ReLU(),
            # Dense(30),
            # ReLU(),
            # Dense(10)
        ])

Is this expected result?

Step should average the gradients by batch size.

It seems to me the optimizer methods, given SGD as example, use sum of gradients from a batch to multiply learning rate directly:

def _compute_step(self, grad):
    return - self.lr * grad

It is suggested we should average the grad by batch size, the benefits of doing this is listed in this post. Basically you do not have to adjust learning rate when changing batch size.

If you agree to this, I would create a pull request to add option to use mean gradients and at the same time provide compatibility to use simply sum of gradients (for efficiency consideration).

SOFTPLUS

should SOFTPLUS be Softplus(x)= log(1+exp(x)) ？ The code looks like Softplux(x) = log(1+exp(-x)) + max（x,0.0）.I tried to compute them.And they are equivalent. But why should we use this more complex way?

Avoid download dataset programmatically.

The MNIST example is like torchvision, it download the dataset programmatically when the data file is not in specific directory. I personally had very bad experience using torchvision because it does not use CDN link and no progress bar is given, in China, it is painfully slow. Moreover, it hard-code the link in the code and add maintenance cost. I think tinynn is doing the same thing.

I imagine if I want to contribute another example, according to the convention, I will need to import urllib and write try except and mkdir logic in my code just to help user download the dataset. Why not make it old-fashion and simpler to provide list of download links in text to make user and contributor life easier?

I love the cleanness of tinynn, I think I can learn a lot from this code. But I suggest to keep the example code small and instead provide documentation or text description to ask the user to download dataset themselves with a list of links (official link, Baidu Yun, etc.).