Llama2 inference for 4-bit AWQ quantized models with C# and ManagedCuda
Based on llama2.c and llama_cu_awq
git clone https://github.com/GilesBathgate/ManagedLlama2
cd ManagedLlama2
./build.sh
Easily download and convert the model
cd examples/Setup
dotnet run ../model.bin ../tokenizer.bin
cd -
Launch a simple web based chat client
cd examples/WebSocket
dotnet run ../model.bin ../tokenizer.bin
cd -
MIT / GPLv3