Code Monkey home page Code Monkey logo

jlama's Introduction

๐Ÿฆ™ Jlama: A modern Java inference engine for LLMs

Cute Jlama

Maven Central

๐Ÿš€ Features

Model Support:

  • Gemma Models
  • Llama & Llama2 Models
  • Mistral & Mixtral Models
  • GPT-2 Models
  • BERT Models
  • BPE Tokenizers
  • WordPiece Tokenizers

Implements:

  • Flash Attention
  • Mixture of Experts
  • Huggingface SafeTensors model and tokenizer format
  • Support for F32, F16, BF16 models
  • Support for Q8, Q4, Q5 model quantization
  • Distributed Inference!

Jlama is built with Java 21 and utilizes the new Vector API for faster inference.

โญ Give us a star!

Like what you see? Please consider giving this a star (โ˜…)!

๐Ÿค” What is it used for?

Add LLM Inference directly to your Java application.

๐Ÿ”ฌ Demo

Jlama includes a simple UI if you just want to chat with an llm.

./run-cli.sh download tjake/llama2-7b-chat-hf-jlama-Q4
./run-cli.sh serve models/llama2-7b-chat-hf-jlama-Q4

open browser to http://localhost:8080/ui/index.html

Demo chat

๐Ÿ•ต๏ธโ€โ™€๏ธ How to use

Jlama includes a cli tool to run models via the run-cli.sh command. Before you do that first download one or more models from huggingface.

Use the ./run-cli.sh download command to download models from huggingface.

./run-cli.sh download gpt2-medium
./run-cli.sh download -t XXXXXXXX meta-llama/Llama-2-7b-chat-hf
./run-cli.sh download intfloat/e5-small-v2

Then run the cli tool to chat with the model or complete a prompt.

./run-cli.sh complete -p "The best part of waking up is " -t 0.7 -tc 16 -q Q4 -wq I8 models/Llama-2-7b-chat-hf
./run-cli.sh chat -p "Tell me a joke about cats." -t 0.7 -tc 16 -q Q4 -wq I8 models/Llama-2-7b-chat-hf

๐Ÿงช Examples

Llama 2 7B

Here is a poem about cats, incluing emojis: 
This poem uses emojis to add an extra layer of meaning and fun to the text.
Cat, cat, so soft and sweet,
Purring, cuddling, can't be beat. ๐Ÿˆ๐Ÿ’•
Fur so soft, eyes so bright,
Playful, curious, such a delight. ๐Ÿ˜บ๐Ÿ”
Laps so warm, naps so long,
Sleepy, happy, never wrong. ๐Ÿ˜ด๐Ÿ˜
Pouncing, chasing, always fun,
Kitty's joy, never done. ๐Ÿพ๐ŸŽ‰
Whiskers twitch, ears so bright,
Cat's magic, pure delight. ๐Ÿ”ฎ๐Ÿ’ซ
With a mew and a purr,
Cat's love, forever sure. ๐Ÿ’•๐Ÿˆ
So here's to cats, so dear,
Purrfect, adorable, always near. ๐Ÿ’•๐Ÿˆ

elapsed: 37s, 159.518982ms per token

GPT-2 (355M parameters)

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, 
in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
a long and diverse and interesting story is told in this book. The author writes:
...
the stories of the unicornes seem to be based on the most vivid and vivid imagination; they are the stories of animals that are a kind of 'spirit animal' , a partly-human spiritual animal that speaks in perfect English , and that often keep their language under mysterious and inaccessible circumstances.
...
While the unicorn stories are mostly about animals, they tell us about animals from other animal species. The unicorn stories are remarkable because they tell us about animals that are not animals at all . They speak and sing in perfect English , and they are very much human beings.
...
This book is not about the unicorn. It is not about anything in particular . It is about a brief and distinct group of animal beings who have been called into existence in a particular remote and unexplored valley in the Andes Mountains. They speak perfect English , and they are very human beings.
...
The most surprising thing about the tales of the unicorn

elapsed: 10s, 49.437500ms per token

๐Ÿ—บ๏ธ Roadmap

  • Support more models
  • Add pure java tokenizers
  • Support Quantization (e.g. k-quantization)
  • Add LoRA support
  • GraalVM support
  • Add distributed inference

๐Ÿท๏ธ License and Citation

The code is available under Apache License.

If you find this project helpful in your research, please cite this work at

@misc{jlama2024,
    title = {Jlama: A modern Java inference engine for large language models},
    url = {https://github.com/tjake/jlama},
    author = {T Jake Luciani},
    month = {January},
    year = {2024}
}

jlama's People

Contributors

tjake avatar jakemh avatar jbellis avatar phact avatar wmsouza avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.