<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

grad doesn't run when under `torch.no_grad()` about functorch HOT 2 CLOSED

pytorch commented on June 20, 2024

grad doesn't run when under `torch.no_grad()`

from functorch.

zou3519 commented on June 20, 2024

One really annoying thing about torch.no_grad is that it is not traceable. JAX has a stop_gradient primitive that operates on Tensors so it does become traceable: https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.stop_gradient.html

However, I think it's useful to be able to use something like torch.no_grad inside of a transform. For example, one pattern I've seen is:

def f(x):
    with torch.no_grad():
        shift = x.mean()
    return x - shift

Proposal:

torch.no_grad does affect grad/vjp transforms. Any computation that happens within torch.no_grad is invisible to vjp/grad
If a user calls grad/vjp inside of torch.no_grad, we raise a warning that explains that their gradients will be 0. (Or maybe this should be an error?)
For tracing... either we introduce something like stop_gradient(Tensor) -> Tensor or figure out how to "trace" torch.no_grad. This sounds a bit like factory function tracing and could potentially be done with a mode-based dispatch key

Alternatives:

from functorch.

zou3519 commented on June 20, 2024

New proposal: here's what I think the semantics should be.

Case 1: grad gets called inside torch.no_grad.

grad should ignore torch.no_grad because it's "creating a new level of autograd above the current level"
Another way to think about this is that grad(f) is a "function transform": its result should not be affected by context managers that are outside of the function f

Case 2: torch.no_grad gets called inside `grad