Comments (2)
gt
itself seems to be available as we have it in
lightning-thunder/thunder/torch/__init__.py
Line 1543 in 7d6e540
from lightning-thunder.
With a relatively more recent NeMo version (r2.0.0rc0
), we appear to hit issues with different modes of indexing:
NotImplementedError: key=(t4, slice(None, None, None)) mixes basic and advanced indexing that is not currently supported
Could this be related to #187 ?
Full traceback:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/tfogal/dev/nemo/examples/multimodal/text_to_image/stable_diffusion/sd_train.py", line 117, in <module>
[rank0]: main()
[rank0]: File "/home/tfogal/dev/nemo/nemo/core/config/hydra_runner.py", line 129, in wrapper
[rank0]: _run_hydra(
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank0]: _run_app(
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank0]: run_and_report(
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank0]: raise ex
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank0]: return func()
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
[rank0]: lambda: hydra.run(
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank0]: _ = ret.return_value
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
[rank0]: raise self._return_value
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
[rank0]: ret.return_value = task_function(task_cfg)
[rank0]: File "/home/tfogal/dev/nemo/examples/multimodal/text_to_image/stable_diffusion/sd_train.py", line 112, in main
[rank0]: trainer.fit(model)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
[rank0]: call._call_and_handle_interrupt(
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
[rank0]: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
[rank0]: return function(*args, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
[rank0]: self._run(model, ckpt_path=ckpt_path)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
[rank0]: results = self._run_stage()
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
[rank0]: self.fit_loop.run()
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
[rank0]: self.advance()
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
[rank0]: self.epoch_loop.run(self._data_fetcher)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run
[rank0]: self.advance(data_fetcher)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 240, in advance
[rank0]: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 187, in run
[rank0]: self._optimizer_step(batch_idx, closure)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 265, in _optimizer_step
[rank0]: call._call_lightning_module_hook(
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
[rank0]: output = fn(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 1263, in optimizer_step
[rank0]: super().optimizer_step(*args, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1291, in optimizer_step
[rank0]: optimizer.step(closure=optimizer_closure)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 151, in step
[rank0]: step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 265, in optimizer_step
[rank0]: optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 230, in optimizer_step
[rank0]: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/amp.py", line 77, in optimizer_step
[rank0]: closure_result = closure()
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in __call__
[rank0]: self._result = self.closure(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure
[rank0]: step_output = self._step_fn()
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 315, in _training_step
[rank0]: training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
[rank0]: output = fn(*args, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 381, in training_step
[rank0]: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in __call__
[rank0]: wrapper_output = wrapper_module(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1620, in forward
[rank0]: else self._run_ddp_forward(*inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1438, in _run_ddp_forward
[rank0]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/tfogal/env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward
[rank0]: out = method(*_args, **_kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/utils/model_utils.py", line 434, in wrap_training_step
[rank0]: output_dict = wrapped(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1812, in training_step
[rank0]: loss_mean, loss_dict = self.fwd_bwd_step(dataloader_iter, False)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1745, in fwd_bwd_step
[rank0]: losses_reduced_per_micro_batch = fwd_bwd_function(
[rank0]: File "/home/tfogal/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 399, in forward_backward_no_pipelining
[rank0]: output_tensor, num_tokens = forward_step(
[rank0]: File "/home/tfogal/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 206, in forward_step
[rank0]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1939, in fwd_output_and_loss_func
[rank0]: loss, loss_dict = model(x, c)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1015, in forward
[rank0]: return self.p_losses(x, c, t, *args, **kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1165, in p_losses
[rank0]: model_output = self.apply_model(x_noisy, t, cond)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 1136, in apply_model
[rank0]: x_recon = self.model(x_noisy, t, **cond)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/ldm/ddpm.py", line 2339, in forward
[rank0]: out = self.diffusion_model(x, t, context=cc)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/module.py", line 62, in forward
[rank0]: res = self._forward_fn(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/__init__.py", line 634, in fn_
[rank0]: cache_entry, inps, pro_to_epi = get_computation_and_inputs(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/__init__.py", line 210, in cache_info_wrapper
[rank0]: res = fn(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/__init__.py", line 484, in get_computation_and_inputs
[rank0]: jit_results: TraceResults = interpreter(
[rank0]: File "/home/tfogal/dev/thunder/thunder/__init__.py", line 198, in _general_frontend
[rank0]: return thunder_general_jit(fn, args, kwargs, sharp_edges=sharp_edges, record_history=record_history)
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/jit_ext.py", line 1555, in thunder_general_jit
[rank0]: result = jfn(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6701, in fn_
[rank0]: raise e
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6669, in fn_2
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6066, in _impl
[rank0]: return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6066, in _impl
[rank0]: return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/tfogal/dev/thunder/thunder/core/interpreter.py", line 6066, in _impl
[rank0]: return fn.__func__(fn.__self__, *args, **kwargs)
[rank0]: File "/home/tfogal/dev/nemo/nemo/collections/multimodal/modules/stable_diffusion/diffusionmodules/openaimodel.py", line 1319, in forward
[rank0]: with transformer_engine.pytorch.fp8_autocast(
[rank0]: NotImplementedError: key=(t4, slice(None, None, None)) mixes basic and advanced indexing that is not currently supported
Line 1319 of openaimodel.py is basically just an autocast block; do we have a second issue of misattributing the lines of an issue when inside a with
context?
Marking triage review for people smarter than me to guide us.
from lightning-thunder.
Related Issues (20)
- CI: Re-Enable torchrun call in Zero to Thunder notebook
- dtype inconsistencies when dividing/rounding tensors
- Implement GroupNorm to invoke APEX GroupNorm for NeMo Stable Diffusion AutoEncoder performance HOT 14
- Dynamic shape needs to be modeled in trace
- OOM errors for Gemma-7, pythia-12b, Llama-2-13b-hf and Nous-Hermes-13b with FSDP zero3 and 2x8 H100 HOT 5
- Refine recording of source locations HOT 6
- Nous-Hermes-13b on 1x8 H100 FSDP zero2 with thunder_cudnn is 23% slower than with inductor HOT 5
- fsdp(jit(...)) transform can use more memory compared to jit(fsdp(...)) HOT 4
- nvfuserex has problems taking getitem. HOT 4
- load/save_state_dict hooks for early transforms HOT 3
- Training Llama-2-13b-hf on 2x8 H100 with Thunder inductor is 47% slower than with Inductor HOT 4
- FP8 Linear and conv with cudnn HOT 1
- Support RN50 BatchNorm fusions with cudnn
- CI : PyTorch nightly CI failing with `FutureWarning: is_compiling is deprecated. Use torch.compiler.is_compiling() instead.`
- Distill API for module transformations from distributed / quantization uses of ThunderModule attributes
- TransformerEngine API changed and caused test failure `AttributeError: 'TELinear' object has no attribute 'fp8_weight_shapes'`
- FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. HOT 2
- [RFC] Option to make a trace easier to interpret HOT 5
- Thunder object's `__repr__` should indicate what object they are (TensorProxy and others)
- nvfuser failure HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightning-thunder.