wiseodd / natural-gradients Goto Github PK

Collection of algorithms for approximating Fisher Information Matrix for Natural Gradient (and second order method in general)

Home Page: https://wiseodd.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

natural-gradients's Introduction

natural-gradients's People

Contributors

Stargazers

Watchers

natural-gradients's Issues

How do you compute fisher information matrix?

natural-gradients/numpy/toy/full_fisher.py

Line 47 in 7d51f19

grad_loglik_z = (t_train-y)/(y - y*y) * dz

I have no idea how do you compute fisher information matrix. Specifically, I don't know how do you compute p(x|θ) without using a prior in your data. Can you explain? Thanks.

possible bug in kfac

In the pytorch implementation of kfac, G1_ is computed as:

G1_ = 1/m * a1.grad.t() @ a1.grad

However, the a1.grad is different from the a_1 in (1) of kfac's paper. Specifically, when you do backpropagation on the network to get a1.grad, it has a coefficient term 1/m in front of it, where m is the size of mini-batch. In other words, a1.grad = 1/m * a_1 (in kfac paper). Consequently, the G1_ is wrong. Similarly, G2_ and G3_ are also wrong.

Please correct me if I misunderstand something. Thanks!

full fisher script stability

I looked at the full_fisher numpy file and tried to change some parameters but it doesn't work. For example, when the X0 and X1 are changed as follows, the loss becomes nan after two steps:

X0 = np.random.randn(100, 2) - 2
X1 = np.random.randn(100, 2) + 2

What is the problem here? Thanks for your time.

Natural gradient for neural network?

Hi @wiseodd, thanks for great implementation! I wonder whether this can be generalized to a simple feed-forward network. In your code, the gradient wrt the parameters are computed directly. For networks, can you use auto-differentiation for that, say in pytorch? I find that is difficult because you need to get the gradient of the size (N, num_of_thetas). Since in my knowledge, when one does backpropagation, the loss should be a scalar, thus the batch N is omitted. What can one do to get the gradient for each instance in the batch?

wiseodd / natural-gradients Goto Github PK

natural-gradients's Introduction

natural-gradients's People

Contributors

Stargazers

Watchers

Forkers

natural-gradients's Issues

How do you compute fisher information matrix?

possible bug in kfac

full fisher script stability

Natural gradient for neural network?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent