Code Monkey home page Code Monkey logo

tinynn's People

Contributors

borgwang avatar dependabot[bot] avatar goodbobobo avatar w32zhong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tinynn's Issues

可以添加transformer ,seq2seq 和 attention 这些吗?

是从知乎上看到的,非常喜欢也这个工程,觉得这个工程对初学者太友好了,比直接使用大深度学习框架,能了解更多的细节。

看了一些transformer 的资料,还是不太理解它是怎么实现的,在网上找的transformer 实现都是基于pytorch 和 tf实现的,没有像这样从头开始实现的。

Step should average the gradients by batch size.

It seems to me the optimizer methods, given SGD as example, use sum of gradients from a batch to multiply learning rate directly:

def _compute_step(self, grad):
    return - self.lr * grad

It is suggested we should average the grad by batch size, the benefits of doing this is listed in this post. Basically you do not have to adjust learning rate when changing batch size.

If you agree to this, I would create a pull request to add option to use mean gradients and at the same time provide compatibility to use simply sum of gradients (for efficiency consideration).

SOFTPLUS

should SOFTPLUS be Softplus(x)= log(1+exp(x)) ? The code looks like Softplux(x) = log(1+exp(-x)) + max(x,0.0).I tried to compute them.And they are equivalent. But why should we use this more complex way?

Avoid download dataset programmatically.

The MNIST example is like torchvision, it download the dataset programmatically when the data file is not in specific directory. I personally had very bad experience using torchvision because it does not use CDN link and no progress bar is given, in China, it is painfully slow. Moreover, it hard-code the link in the code and add maintenance cost. I think tinynn is doing the same thing.

I imagine if I want to contribute another example, according to the convention, I will need to import urllib and write try except and mkdir logic in my code just to help user download the dataset. Why not make it old-fashion and simpler to provide list of download links in text to make user and contributor life easier?

I love the cleanness of tinynn, I think I can learn a lot from this code. But I suggest to keep the example code small and instead provide documentation or text description to ask the user to download dataset themselves with a list of links (official link, Baidu Yun, etc.).

One suggestion on file header

I think it is a very outdated practice to add information such as author, date and filename into file header. These are tracked by git already (in much better way), and they only add maintenance cost. Just a suggestion, for later new files to be created.

I come across this suggestion because it makes me hesitate whether I should modify your header when I change a minor thing in the code. And it will also cause confusion for potential contributors of this project (if you want more people to participate).

If you agree, I can help to remove those fields in current source code.

The MNIST example has surprising test accuracy.

Why only one layer of dense net without any activation will have over 80% test accuracy after the first epoch?

I simply comment other layers in dense net, and use dense net to train.

        net = Net([
            # Dense(200),
            # ReLU(),
            Dense(10),
            # ReLU(),
            # Dense(70),
            # ReLU(),
            # Dense(30),
            # ReLU(),
            # Dense(10)
        ])

Is this expected result?

激活函数问题

大佬您好,在看到激活函数的时候有一点不清楚,为什么forward和backward要再增加两个函数fun和derivative_func?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.