Hi! In the <a href="https://www.kernel-operations.io/geomloss/api/pytorch-api.html" re

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

autograd support? about geomloss HOT 3 CLOSED

jeanfeydy commented on July 17, 2024

autograd support?

from geomloss.

Comments (3)

jeanfeydy commented on July 17, 2024 2

Hi @Yura52 , @NightWinkle ,

Thanks a lot for your interest in the library and relevant questions!

As described by @NightWinkle and in Eqs. (3.226-3.227) of my PhD thesis, the current implementation allows you to backpropagate efficiently through the computation of the Sinkhorn loss: if you're interested in optimizing a (smooth) Wasserstein distance with respect to the weights alpha, beta and the sample locations x, y, you're good to go. This seems to be what you have in mind, which is good news.

Note, however, that you may encounter problems if you plan to do something a bit more exotic. For instance, some authors use the Sinkhorn algorithm to compute an optimal transport plan (as in e.g. this tutorial), and then backprop through a loss that is not a regularized Wasserstein distance. In this situation, the simplifications that I hardcoded into GeomLoss do not hold anymore: to retrieve the correct gradients, you should indeed backprop through the iterations of the Sinkhorn loop. In other words, comment the torch.autograd.set_grad_enabled(False) line or add some extra iterations at the end of the loop, in the final "extrapolation" step.

All these points are discussed in recent works by Pierre Ablin, such as this paper: going forward, I will certainly add a "switch" for this behaviour as an optional argument. Right now, I am mostly working on improving the low-level KeOps routines of GeomLoss and finalizing theoretical papers, but I will really push for a stable v1.0 release over the next few months.

I hope that this answers your question: feel free to re-open the issue if needed :-)
Best regards,
Jean

from geomloss.

NightWinkle commented on July 17, 2024 1

It can differentiate through the solution of the Optimal Transport problem.

As you can read in the litterature, for instance in Interpolating between Optimal Transport and MMD using Sinkhorn Divergences, the gradient of Sinkhorn is actually equal to the gradient of one Sinkhorn iteration.
For this reason, it is more efficient to compute the gradient using only one of these Sinkhorn iterations.

The line you are mentioning just allows to disable computation of the autodifferentiation graph through the steps that are not needed to compute the gradient and would make backward pass quite slow.

As you can see tho, gradients are reactivated before the last iteration, allowing for the gradient to be computed.

from geomloss.

Yura52 commented on July 17, 2024

Oh, I see, thank you for the answer!

from geomloss.

Recommend Projects

autograd support? about geomloss HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent