⚡ Lit-LLaMA ️

Independent implementation of LLaMA that is fully open source under the Apache 2.0 license.

This implementation builds on nanoGPT. Weights are distributed by Meta under a research-only license.

Why?

We believe that AI should be fully open source and part of the collective knowledge.

The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

This "taints" any other code and prevents meaningful academic and commercial use.

Lit-LLaMA solves that for good.

Design principles

Lit-LLaMA is:

Simple: Single-file implementation without boilerplate.
Correct: Numerically equivalent to the original model.
Optimized: Runs on consumer hardware or at scale.
Open-source: No strings attached.

Get involved!

Join our Discord to build high-performance, truly open-source models for the common benefit of the community.

Setup

Clone the repo

git clone https://github.com/Lightning-AI/lit-llama
cd lit-llama

install dependencies

pip install -r requirements.txt

You are all set! 🎉

Use the model

To generate text predictions, download the model weights following the instructions on the official LLaMA repository.

Once downloaded, you should have a folder like this:

checkpoints/llama
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── 13B
│   ...
├── tokenizer_checklist.chk
└── tokenizer.model

Convert the weights to the Lit-LLaMA format:

python scripts/convert_checkpoint.py \
    --output_dir checkpoints/lit-llama \
    --ckpt_dir checkpoints/llama \
    --tokenizer_path checkpoints/llama/tokenizer.model \
    --model_size 7B

Run inference:

python generate.py --prompt "Hello, my name is"

This will run the 7B model and require ~26 GB of GPU memory (A100 GPU).

Run Lit-LLaMA on consumer devices

For GPUs with less memory, enable quantization (--quantize true) or use bfloat16 (--dtype bfloat16). Quantization will take longer to load but require ~8GB of memory. bfloat16 is closer to the "full deal" and runs on ~10GB of GPU memory. This can run on any consumer GPU.

python generate.py --quantize true --prompt "Hello, my name is"

See python generate.py --help for more options.

Finetune the model

We provide a simple training scripts in finetune_lora.py and finetune_adapter.py that instruction-tunes a pretrained model on the Alpaca dataset using the techniques of LoRA and Adapter.

Download the data and generate a instruction tuning dataset:
```
python scripts/prepare_alpaca.py
```

Run the finetuning script

python finetune_lora.py

python finetune_adapter.py

It is expected that you have downloaded the pretrained weights as described above. The finetuning requires a GPU with 40 GB memory (A100). Coming soon: LoRA + quantization for training on a consumer-grade GPU!

Get involved!

We're in a quest towards fully open source AI.

Join us and start contributing, especially on the following areas:

Look at train.py for a starting point towards pre-training / fine-tuning using Lightning Fabric.

Don't forget to join our Discord!

Acknowledgements

@karpathy for nanoGPT
@FacebookResearch for the original LLaMA implementation
@TimDettmers for bitsandbytes
@Microsoft for LoRA

License

Lit-LLaMA is released under the Apache 2.0 license.

luisgrisolia / lit-llama Goto Github PK

lit-llama's Introduction

⚡ Lit-LLaMA ️

⚡ Lit-LLaMA ️

Why?

Design principles

Get involved!

Setup

Use the model

Run Lit-LLaMA on consumer devices

Finetune the model

Get involved!

Acknowledgements

License

lit-llama's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent