Code Monkey home page Code Monkey logo

Comments (7)

ezyang avatar ezyang commented on September 22, 2024 2

We had a meeting on Thursday to discuss this. The main points:

  1. I don't want to remove inplace ops from autograd formulas (deoptimizing them)
  2. I am OK with bunching autograd operators into bigger units (like contiguous as itself, not a call to copy_)
  3. Functionalization should be a tool in the toolkit, in general
  4. It should be possible to have multiple composites and pick the one that contextually makes sense

from functorch.

Chillee avatar Chillee commented on September 22, 2024 1

We decided to revert this for now.

Essentially, the "failure mode" for decomposing an op is much worse than not decomposing an operator. If we decompose an op and it's then slow/throws an error, the user will see a warning/error like aten::op_user_doesnt_use can't be vmapped, and the user then has no idea where the error came from.

So, for now, we think it's better to err on the side of not decomposing an op unless we explicitly do so.

from functorch.

ezyang avatar ezyang commented on September 22, 2024

The problem is that some CompositeImplicitAutograd ops decompose to in-place operations that are not compatible with vmap (note here).

Yes, this is trouble. I have two parallel thoughts here:

  • We should ensure implicit autograd composites don't ever use mutation (but at cost of efficiency?)
  • Maybe we can provide both the non-mutating and mutating versions? Perhaps using @ailzhang's functionalization pass?

Can we solve these problems by just registering an override for the vmap key for those operations?

@zou3519 Well, VMap key has higher precedence than CompositeImplicitAutograd, so yes, that will just work.

from functorch.

zou3519 avatar zou3519 commented on September 22, 2024

If functionalization could take care of this then that would be great. @ailzhang does functionalization handle something like the following?

x = torch.empty_like(y)
x.copy_(y)

@ezyang one alternative along the lines of "providing both the non-mutating and mutating versions" could be if we have the ability to define our own set of primitives with respect to autograd.
For example, .contiguous() eventually calls .copy_() -- .copy_ is the primitive with respect to autograd.

Registering an override for the vmap key for contiguous doesn't actually work because when someone does vmap(grad(blah)) then the dispatch for the grad transform is going to break up .contiguous() into its constituents and then vmap will see the .copy_ and it will be sad (that's what is going on in #55)

I'm not sure it's possible to "define a new primitive with respect to autograd" out of tree, though: autograd functions exist but I'm not sure they're sufficient

from functorch.

zou3519 avatar zou3519 commented on September 22, 2024

After some experimenting... it looks like if I want to make a new primitive called functorch::to, then setting up an autograd::Function for it and registering overrides for the Autograd, CPU, and CUDA keys seems to make this work:

TORCH_LIBRARY_IMPL(functorch, Autograd, m) {
  // to_autograd invokes an autograd::Function
  m.impl("to", to_autograd);
}
TORCH_LIBRARY_IMPL(functorch, CPU, m) {
  // to_kernel just calls at::to
  m.impl("to", to_kernel);
}
TORCH_LIBRARY_IMPL(functorch, CUDA, m) {
  m.impl("to", to_kernel);
}

unfortunately there's a lot of boilerplate here (e.g. setting up the autograd::Function and registering all of those overrides)

from functorch.

ezyang avatar ezyang commented on September 22, 2024

My conception of functionalization is that it is a functional transformation, much like grad/vmap are, which take traces that have mutations and transform them into traces without mutation. So in the vmap(grad( case, what you would actually do is vmap(functionalize(grad( (Don't ask me about UX, I don't think you want users to have to insert the functionalize pass in explicitly, so we'd have to figure something out about automatically inserting this pass when necessary).

one alternative along the lines of "providing both the non-mutating and mutating versions" could be if we have the ability to define our own set of primitives with respect to autograd.

Yes, this is possible. Today we have CPU and we have AutogradCPU; it is possible that given Batched, we should have AutogradBatched (this is a little weird, because Batched isn't a backend, but I'm guessing we probably could make it work). Then you would override the definition of contiguous directly in AutogradBatched to get the better behavior. I'm not sure why you'd want to implement a functorch::to though...

from functorch.

zou3519 avatar zou3519 commented on September 22, 2024

Done

from functorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.