I'm super impressed with the progress you've been making! Autograd has the potential t

Is the objective function scalar-valued? That is, is the call to <code class="notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Elementwise Hessian about autograd HOT 10 CLOSED

richardotis commented on July 4, 2024

Elementwise Hessian

from autograd.

Comments (10)

mattjj commented on July 4, 2024 1

I think it works. Here's a cleaner elementwise_hess function for you, as well as a better test. (Of course, don't trust me, please test it yourself!)

from autograd import elementwise_grad
import autograd.numpy as np

def energy(args):
    return 8.314*args[0]*(args[1]*np.log(args[1]) + args[2]*np.log(args[2]))

gradfun = elementwise_grad(energy)
N = 100
M = 50

inp = [1000*np.random.rand(N,1), np.random.rand(1,M), np.random.rand(1,M)]
inp = np.broadcast_arrays(*inp)


gres = gradfun(inp)
print([i.shape for i in gres])

### here's the new stuff!!

from autograd import jacobian

def elementwise_hess(fun, argnum=0):
    sum_latter_dims = lambda x: np.sum(x.reshape(x.shape[0], -1), 1)
    def sum_grad_output(*args, **kwargs):
        return sum_latter_dims(elementwise_grad(fun)(*args, **kwargs))
    return jacobian(sum_grad_output, argnum)

# make inp an array so that jacobian() can handle it
array_inp = np.concatenate([arr[None,...] for arr in inp])

# check shapes on the given energy function
hessfun = elementwise_hess(energy)
print(hessfun(array_inp).shape)

# test on some scalar elements
from autograd import grad
hessval = hessfun(array_inp)
print grad(lambda x: grad(energy)(x)[1])(array_inp[:,8,42])
print hessval[:,1,8,42]

It prints something like this (different every time due to the pseudorandom numbers generated):

[(100, 50), (100, 50), (100, 50)]
(3, 3, 100, 50)
[   7.10510658  920.2143044     0.        ]
[   7.10510658  920.2143044     0.        ]

Please reopen the issue if it's not right!

from autograd.

mattjj commented on July 4, 2024

Is the objective function scalar-valued? That is, is the call to energy returning a scalar (meaning you could use grad and not elementwise_grad) or is it returning an array of values? (I'm confused by 'the gradient of the objective function with respect to the variable at that argument index', since that sounds more like ∂f/∂xi than ∂fi/∂xi.)

EDIT: I was confused here because I never thought of using elementwise_grad for broadcasting support!

A runnable example (i.e. a definition for energy) along with a fixed input and desired output (maybe with fewer dimensions, or at least smaller ones) would also help!

from autograd.

mattjj commented on July 4, 2024

My guess is that energy is a six-argument function that broadcasts across arrays (in this case, effectively six 10x10 arrays). Is that right?

from autograd.

richardotis commented on July 4, 2024

Sorry, here's a minimal working example. energy is a scalar-valued function which accepts some number of arguments, and we would like to broadcast some arguments against each other. For example, imagine we have 100 selected temperatures and 50 selected compositions, and we would like to compute the energy at all 100 x 50 = 5000 combinations, as well as the gradient and Hessian at each point. (This generalizes: we may add a pressure or other axes as well.)

So if I pass L arguments which broadcast to arrays of size (N, M, ...), I want my gradient to be an array of size (L, N, M, ...), and I want my Hessian to be an array of size (L, L, N, M, ...).

from autograd import elementwise_grad
import autograd.numpy as anp
import numpy as np
def energy(args):
    return 8.314*args[0]*(args[1]*anp.log(args[1]) + args[2]*anp.log(args[2]))
grad = elementwise_grad(energy)
N = 100
M = 50
inp = [1000*np.random.rand(N,1), np.random.rand(1,M), np.random.rand(1,M)]
inp = np.broadcast_arrays(*inp)
%time gres = grad(inp)
print([i.shape for i in gres])

CPU times: user 1 ms, sys: 0 ns, total: 1 ms
Wall time: 1.55 ms
[(100, 50), (100, 50), (100, 50)]

from autograd.

mattjj commented on July 4, 2024

I'm kinda wondering if this works (based on your earlier example, not the most recent version):

from autograd import elementwise_grad, jacobian
import autograd.numpy as np

def wrapper(args):
    return sum(a**2 for a in args)

inp = [1000*np.random.rand(10,1), np.random.rand(1,10), np.random.rand(1,10), np.random.rand(1,10), np.random.rand(1,10), np.random.rand(1,10)]
inp = np.broadcast_arrays(*inp)
array_inp = np.concatenate(map(lambda x: np.expand_dims(x, 0), inp))

print(array_inp.shape)
gres = elementwise_grad(wrapper)(array_inp)
print(gres.shape)
hess = jacobian(lambda x: elementwise_grad(wrapper)(x).sum(2).sum(1))(array_inp)
print(hess.shape)

It attempts to use the same trick as in elementwise_grad to support broadcasting (i.e. summing across dimensions effectively broadcasts the gradient). It has the right shape at least!

I'll try to work out a test case to see if it's actually producing the right values. Let me know if you figure one out first.

from autograd.

richardotis commented on July 4, 2024

@mattjj Fantastic user support! :) It's my bedtime but I'm pleased to see a possible solution. I'll give it a shot tomorrow and update this issue with what I find.

from autograd.

mattjj commented on July 4, 2024

Something like "broadcasting_hess" would be a better name.

from autograd.

richardotis commented on July 4, 2024

So I haven't analytically verified anything yet, but the Hessians I'm computing with your method are symmetric and the correct matrix elements are zero. I'm going to wire it up to my existing minimization routines and see how it works.

I like "elementwise_hess" because it's makes it clear it's the counterpart to the "elementwise_grad" function -- if I was browsing your documentation I'd assume they had similar functions. (At the very least, if you rename one you should rename both.)

from autograd.

richardotis commented on July 4, 2024

The other thing that pleases me is that both elementwise_grad and elementwise_hess correctly handle the case when inputs are "disconnected", i.e., an input doesn't actually show up in the graph. For example, in principle energy is a function of pressure but usually I don't have pressure-dependent terms because the model is incomplete. For completeness the user still has to specify a pressure no matter what, and it's nice to not have to special-case anything for zeros to get correctly returned and broadcast. This was an annoying case to deal with in sympy/sympy#9725 so I'm pleased I can change less than 10 lines of code to get robust back-propagation working in my use case.

from autograd.

richardotis commented on July 4, 2024

Actually the above turned out not to be entirely true. There's still a broadcasting hangup but it's pretty easily solved. I'm working on a code writeup which I'll share after I've kicked the tires a bit.

from autograd.

Elementwise Hessian about autograd HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent