Comments (4)
triage review —
- we think we can change the backward trace to use constant numbers instead of number proxies for now
- but we do want number proxies in the backward trace soon, so we should carefully consider their integration with nvfuser and grad, and imposing restraints like "this must be a constant value"
- maybe nvFuser's cache could detect the value of the NumberProxy?
from lightning-thunder.
so we should carefully consider their integration with nvfuser and grad, and imposing restraints like "this must be a constant value"
Imposing those constraints universally in thunder is what I'm planning to do.
When you mentioned explicitly about integration
, I'm under the impression that you are trying to allow backend to propose constraints. I'm open to that discussion and have an issue opened already. That might be a bit trickier to plumb through back to prologue trace.
Meanwhile, I think an easy way to unblock us for now is to just let integration reject things when seeing NumberProxy on those that must be a constant value
. i.e. reduction axes for example.
from lightning-thunder.
NumberProxy shouldn't show up in trace.
Do you think NumberProxies shouldn't be used in traces, in general, today and in the future, or only today because of how it's not dynamic and only a wrapper around a constant today?
It's possible to modify the trace to unwrap all NumberProxies into actual values by modifying backward_trace
after this line https://github.com/Lightning-AI/lightning-thunder/blob/4663b87e67955edec371c6f37a763a7ec358d835/thunder/core/transforms.py#L3961C5-L3961C49.
from lightning-thunder.
Do you think NumberProxies shouldn't be used in traces, in general, today and in the future, or only today because of how it's not dynamic and only a wrapper around a constant today?
I should be more careful with my wording 😆
Yes NumberProxies should be used in the trace in general. There're some rough corners (i.e. nvfuser's handling of baking in static number from proxies), but we should just patch those.
But I'm surprised to see NumberProxy show up with thunder.jit
today. IIUC, that's what TestExecutor.make_callable
uses for compilation. Current default caching logic is CONSTANT_VALUES, I thought that means everything gets baked in as static and I'm not sure where the NumberProxy is coming from.
Totally not pointing finger at grad transform. But I think in general NumberProxy should be treated differently with a plain Number. i.e. A NumberProxy means dynamic value and that should be respected by transforms.
from lightning-thunder.
Related Issues (20)
- Broken CI tests for distributed HOT 5
- `CUDAGraphExecutor` - limited to static graphs only
- `use_cuda` deprecated, switch to `use_device = cuda` instead
- Support for Stable Diffusion models HOT 1
- Add nvfuser to requirements.txt
- benchmark_litgpt.py + Llama-3-8B + FSDP hits OOM since 5/4/24 on H100 HOT 2
- Add the benchmark for ResNet50 HOT 1
- have a method to compare speed of different parts of training between compilation backends
- Use nvFuser executor decisions to pass on op execution to a different backend and retire hybrid `torch_compile_cat_ex` executor. HOT 1
- Expose parameters with overrides in ThunderModule .
- Quantization as a tranform
- Unexpected keyword arg 'inplace' for torch.nn.SiLU HOT 1
- Implement torch.Tensor.masked_fill_ HOT 1
- TypeError with torch.finfo() HOT 1
- TypeError with torch.nn.functional.pad HOT 1
- 'NoneType' object error using thunder.jit with NeMo Stable Diffusion HOT 1
- Recursion error in transformer module with NeMo Stable Diffusion
- Hang using thunder.jit with tokenizer in NeMo Stable Diffusion
- Constraints to insert static numbers
- CI: Re-Enable torchrun call in Zero to Thunder notebook
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightning-thunder.