Background <a class="user-mention notranslate" data-hovercard-type

Decompose CompositeImplicitAutograd ops at the FuncTorchBatched key about functorch HOT 7 OPEN

pytorch commented on September 22, 2024

Decompose CompositeImplicitAutograd ops at the FuncTorchBatched key

from functorch.

Comments (7)

ezyang commented on September 22, 2024 2

We had a meeting on Thursday to discuss this. The main points:

I don't want to remove inplace ops from autograd formulas (deoptimizing them)
I am OK with bunching autograd operators into bigger units (like contiguous as itself, not a call to copy_)
Functionalization should be a tool in the toolkit, in general
It should be possible to have multiple composites and pick the one that contextually makes sense

from functorch.

Chillee commented on September 22, 2024 1

We decided to revert this for now.

Essentially, the "failure mode" for decomposing an op is much worse than not decomposing an operator. If we decompose an op and it's then slow/throws an error, the user will see a warning/error like aten::op_user_doesnt_use can't be vmapped, and the user then has no idea where the error came from.

So, for now, we think it's better to err on the side of not decomposing an op unless we explicitly do so.

from functorch.

ezyang commented on September 22, 2024

The problem is that some CompositeImplicitAutograd ops decompose to in-place operations that are not compatible with vmap (note here).

Yes, this is trouble. I have two parallel thoughts here:

We should ensure implicit autograd composites don't ever use mutation (but at cost of efficiency?)
Maybe we can provide both the non-mutating and mutating versions? Perhaps using @ailzhang's functionalization pass?

Can we solve these problems by just registering an override for the vmap key for those operations?

@zou3519 Well, VMap key has higher precedence than CompositeImplicitAutograd, so yes, that will just work.

from functorch.

zou3519 commented on September 22, 2024

If functionalization could take care of this then that would be great. @ailzhang does functionalization handle something like the following?

x = torch.empty_like(y)
x.copy_(y)

@ezyang one alternative along the lines of "providing both the non-mutating and mutating versions" could be if we have the ability to define our own set of primitives with respect to autograd.
For example, .contiguous() eventually calls .copy_() -- .copy_ is the primitive with respect to autograd.

Registering an override for the vmap key for contiguous doesn't actually work because when someone does vmap(grad(blah)) then the dispatch for the grad transform is going to break up .contiguous() into its constituents and then vmap will see the .copy_ and it will be sad (that's what is going on in #55)

I'm not sure it's possible to "define a new primitive with respect to autograd" out of tree, though: autograd functions exist but I'm not sure they're sufficient

from functorch.

zou3519 commented on September 22, 2024

After some experimenting... it looks like if I want to make a new primitive called functorch::to, then setting up an autograd::Function for it and registering overrides for the Autograd, CPU, and CUDA keys seems to make this work:

TORCH_LIBRARY_IMPL(functorch, Autograd, m) {
  // to_autograd invokes an autograd::Function
  m.impl("to", to_autograd);
}
TORCH_LIBRARY_IMPL(functorch, CPU, m) {
  // to_kernel just calls at::to
  m.impl("to", to_kernel);
}
TORCH_LIBRARY_IMPL(functorch, CUDA, m) {
  m.impl("to", to_kernel);
}

unfortunately there's a lot of boilerplate here (e.g. setting up the autograd::Function and registering all of those overrides)

from functorch.

ezyang commented on September 22, 2024

My conception of functionalization is that it is a functional transformation, much like grad/vmap are, which take traces that have mutations and transform them into traces without mutation. So in the vmap(grad( case, what you would actually do is vmap(functionalize(grad( (Don't ask me about UX, I don't think you want users to have to insert the functionalize pass in explicitly, so we'd have to figure something out about automatically inserting this pass when necessary).

one alternative along the lines of "providing both the non-mutating and mutating versions" could be if we have the ability to define our own set of primitives with respect to autograd.

Yes, this is possible. Today we have CPU and we have AutogradCPU; it is possible that given Batched, we should have AutogradBatched (this is a little weird, because Batched isn't a backend, but I'm guessing we probably could make it work). Then you would override the definition of contiguous directly in AutogradBatched to get the better behavior. I'm not sure why you'd want to implement a functorch::to though...

from functorch.

zou3519 commented on September 22, 2024

Done

from functorch.

Decompose CompositeImplicitAutograd ops at the FuncTorchBatched key about functorch HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent