Handle returning NamedTuples from the JIT

Make PyTorch models Lightning fast.

Lightning.ai • Performance • Get started • Install • Examples • Inside Thunder • Get involved! • Documentation

Welcome to ⚡ Lightning Thunder

Thunder makes PyTorch models Lightning fast.

Thunder is a source-to-source compiler for PyTorch. It makes PyTorch programs faster by combining and using different hardware executors at once (for instance, nvFuser, torch.compile, cuDNN, and TransformerEngine FP8).

It supports both single and multi-GPU configurations. Thunder aims to be usable, understandable, and extensible.

Note

Lightning Thunder is in alpha. Feel free to get involved, but expect a few bumps along the way.

Single-GPU performance

Thunder can achieve significant speedups over standard non-compiled PyTorch code ("PyTorch eager"), through the compounding effects of optimizations and the use of best-in-class executors. The figure below shows the pretraining throughput for Llama 2 7B as implemented in LitGPT.

As shown in the plot above, Thunder achieves a 40% speedup in training throughput compared to eager code on H100 using a combination of executors including nvFuser, torch.compile, cuDNN, and TransformerEngine FP8.

Multi-GPU performance

Thunder also supports distributed strategies such as DDP and FSDP for training models on multiple GPUs. The following plot displays the normalized throughput measured for Llama 2 7B without FP8 mixed precision; support for FSDP is in progress.

Get started

The easiest way to get started with Thunder, requiring no extra installations or setups, is by using our Zero to Thunder Tutorial Studio.

Install Thunder

To use Thunder on your local machine, first install nvFuser nightly and PyTorch nightly together as follows:

# install nvFuser which installs the matching nightly PyTorch
pip install --pre 'nvfuser-cu121[torch]' --extra-index-url https://pypi.nvidia.com

Then, install Thunder as follows:

# install thunder
pip install lightning-thunder

Advanced install options

Install from main

Alternatively, you can install the latest version of Thunder directly from this GitHub repository as follows:

# 1) Install nvFuser and PyTorch nightly dependencies:
pip install --pre 'nvfuser-cu121[torch]' --extra-index-url https://pypi.nvidia.com

# 2) Install Thunder itself
pip install git+https://github.com/Lightning-AI/lightning-thunder.git

Install to tinker and contribute

If you are interested in tinkering with and contributing to Thunder, we recommend cloning the Thunder repository and installing it in pip's editable mode:

git clone https://github.com/Lightning-AI/lightning-thunder.git
cd lightning-thunder
pip install -e .

Develop and run tests

After cloning the lightning-thunder repository and installing it as an editable package as explained above, ou can set up your environment for developing Thunder by installing the development requirements:

pip install -r requirements/devel.txt

Now you run tests:

pytest thunder/tests

Thunder is very thoroughly tested, so expect this to take a while.

Hello World

Below is a simple example of how Thunder allows you to compile and run PyTorch code:

import torch
import thunder


def foo(a, b):
    return a + b


jfoo = thunder.jit(foo)

a = torch.full((2, 2), 1)
b = torch.full((2, 2), 3)

result = jfoo(a, b)

print(result)

# prints
# tensor(
#  [[4, 4]
#   [4, 4]])

The compiled function jfoo takes and returns PyTorch tensors, just like the original function, so modules and functions compiled by Thunder can be used as part of larger PyTorch programs.

Train models

Thunder is in its early stages and should not be used for production runs yet.

However, it can already deliver outstanding performance for pretraining and finetuning LLMs supported by LitGPT, such as Mistral, Llama 2, Gemma, Falcon, and others.

Check out the LitGPT integration to learn about running LitGPT and Thunder together.

Inside Thunder: A brief look at the core features

Given a Python callable or PyTorch module, Thunder can generate an optimized program that:

Computes its forward and backward passes
Coalesces operations into efficient fusion regions
Dispatches computations to optimized kernels
Distributes computations optimally across machines

To do so, Thunder ships with:

A JIT for acquiring Python programs targeting PyTorch and custom operations
A multi-level intermediate representation (IR) to represent operations as a trace of a reduced operation set
An extensible set of transformations on the trace of a computational graph, such as grad, fusions, distributed (like ddp, fsdp), functional (like vmap, vjp, jvp)
A way to dispatch operations to an extensible collection of executors

Thunder is written entirely in Python. Even its trace is represented as valid Python at all stages of transformation. This allows unprecedented levels of introspection and extensibility.

Thunder doesn't generate code for accelerators, such as GPUs, directly. It acquires and transforms user programs so that it's possible to optimally select or generate device code using fast executors like:

torch.compile
nvFuser
cuDNN
Apex
TransformerEngine
PyTorch eager
Custom CUDA kernels through PyCUDA, Numba, CuPy
Custom kernels written in OpenAI Triton

Modules and functions compiled with Thunder fully interoperate with vanilla PyTorch and support PyTorch's autograd. Also, Thunder works alongside torch.compile to leverage its state-of-the-art optimizations.

Documentation

Online documentation is available. To build documentation locally you can use

make docs

and point your browser to the generated docs at docs/build/index.html.

Get involved!

We appreciate your feedback and contributions. If you have feature requests, questions, or want to contribute code or config files, please don't hesitate to use the GitHub Issue tracker.

We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.

License

Lightning Thunder is released under the Apache 2.0 license. See the LICENSE file for details.

	key = (bsym.sym, subkey := _make_cache_key(bsym.args, bsym.kwargs))
	cached_result = _cache.get(key, None) if subkey is not None else None
	if cached_result is not None:
	return cached_result

	python -um pytest -sv "$test" --pythonwarnings ignore --junitxml="$test-results.xml" 2>&1 > "$test-output.txt"
	pytest_status=$?
	printf "$test status >>> $pytest_status\n"

	if [ $pytest_status -ne 0 ]; then
	status=$pytest_status
	cat "$test-output.txt"
	fi

	# TODO When getitem is fully supported this can be changed to be an execution transform instead of a direct impl
	def _slice_prim_impl(
	a: torch.Tensor, start_indices: Sequence[int], end_indices: Sequence[int], strides: None \| Sequence[int] = None
	) -> torch.Tensor:
	_strides = strides if strides is not None else [1] * len(start_indices)

	slices: list = []
	for start, stop, step in zip(start_indices, end_indices, _strides):
	slices.append(slice(start, stop, step))

	return operator.getitem(a, slices)

	slice_prim_impl = ex.register_operator("torch_slice_prim_impl", meta=prims.slice_prim.meta, fn=_slice_prim_impl)
	_register_implementation(prims.slice_prim, slice_prim_impl, checker=_always_executable)

lightning-ai / lightning-thunder Goto Github PK

lightning-thunder's Introduction

Welcome to ⚡ Lightning Thunder

Single-GPU performance

Multi-GPU performance

Get started

Install Thunder

Install from main

Install to tinker and contribute

Develop and run tests

Hello World

Train models

Inside Thunder: A brief look at the core features

Documentation

Get involved!

License

lightning-thunder's People

Contributors

Stargazers

Watchers

Forkers

lightning-thunder's Issues

🚀 Feature

Motivation

Pitch

Additional context

🚀 Feature

🚀 Feature

Motivation

🐛 Bug

To Reproduce

🐛 Bug

🚀 Feature

Motivation

🐛 Bug

To Reproduce

🐛 Bug

🐛 Bug

To Reproduce

Example

Environment

Additional context

🚀 Feature

Motivation

Pitch

🐛 Bug

🐛 Bug

🚀 Feature

Motivation

Alternatives

Additional context

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

🚀 Feature

Motivation

🚀 Feature

Motivation

🐛 Bug

To Reproduce

🚀 Feature

Motivation

Pitch

Additional context

🐛 Bug

To Reproduce

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

🐛 Bug

🚀 Feature

Motivation

Pitch

Alternatives

🚀 Feature