Code Monkey home page Code Monkey logo

gemma.cpp's Introduction

gemma.cpp

gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma foundation models from Google.

For additional information about Gemma, see ai.google.dev/gemma. Model weights, including gemma.cpp specific artifacts, are available on kaggle.

NOTE: 2024-04-04: if using 2B models, please re-download weights from Kaggle and ensure you have the latest version (-mqa or version 3). We are changing the code to match the new weights. If you wish to use old weights, change ConfigGemma2B in configs.h back to kVocabSize = 256128 and kKVHeads = 8.

Who is this project for?

Modern LLM inference engines are sophisticated systems, often with bespoke capabilities extending beyond traditional neural network runtimes. With this comes opportunities for research and innovation through co-design of high level algorithms and low-level computation. However, there is a gap between deployment-oriented C++ inference runtimes, which are not designed for experimentation, and Python-centric ML research frameworks, which abstract away low-level computation through compilation.

gemma.cpp provides a minimalist implementation of Gemma 2B and 7B models, focusing on simplicity and directness rather than full generality. This is inspired by vertically-integrated model implementations such as ggml, llama.c, and llama.rs.

gemma.cpp targets experimentation and research use cases. It is intended to be straightforward to embed in other projects with minimal dependencies and also easily modifiable with a small ~2K LoC core implementation (along with ~4K LoC of supporting utilities). We use the Google Highway Library to take advantage of portable SIMD for CPU inference.

For production-oriented edge deployments we recommend standard deployment pathways using Python frameworks like JAX, Keras, PyTorch, and Transformers (all model variations here).

Contributing

Community contributions large and small are welcome. See DEVELOPERS.md for additional notes contributing developers and join the discord by following this invite link. This project follows Google's Open Source Community Guidelines.

Active development is currently done on the dev branch. Please open pull requests targeting dev branch instead of main, which is intended to be more stable.

Quick Start

System requirements

Before starting, you should have installed:

Building natively on Windows requires the Visual Studio 2012 Build Tools with the optional Clang/LLVM C++ frontend (clang-cl). This can be installed from the command line with winget:

winget install --id Kitware.CMake
winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--passive --wait --add Microsoft.VisualStudio.Workload.VCTools;installRecommended --add Microsoft.VisualStudio.Component.VC.Llvm.Clang --add Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset"

Step 1: Obtain model weights and tokenizer from Kaggle or Hugging Face Hub

Visit the Gemma model page on Kaggle and select Model Variations |> Gemma C++. On this tab, the Variation dropdown includes the options below. Note bfloat16 weights are higher fidelity, while 8-bit switched floating point weights enable faster inference. In general, we recommend starting with the -sfp checkpoints.

Alternatively, visit the gemma.cpp models on the Hugging Face Hub. First go the the model repository of the model of interest (see recommendations below). Then, click the Files and versions tab and download the model and tokenizer files. For programmatic downloading, if you have huggingface_hub installed, you can also download by running:

huggingface-cli login # Just the first time
huggingface-cli download google/gemma-2b-sfp-cpp --local-dir build/

2B instruction-tuned (it) and pre-trained (pt) models:

Model name Description
2b-it 2 billion parameter instruction-tuned model, bfloat16
2b-it-sfp 2 billion parameter instruction-tuned model, 8-bit switched floating point
2b-pt 2 billion parameter pre-trained model, bfloat16
2b-pt-sfp 2 billion parameter pre-trained model, 8-bit switched floating point

7B instruction-tuned (it) and pre-trained (pt) models:

Model name Description
7b-it 7 billion parameter instruction-tuned model, bfloat16
7b-it-sfp 7 billion parameter instruction-tuned model, 8-bit switched floating point
7b-pt 7 billion parameter pre-trained model, bfloat16
7b-pt-sfp 7 billion parameter pre-trained model, 8-bit switched floating point

Note

Important: We strongly recommend starting off with the 2b-it-sfp model to get up and running.

Step 2: Extract Files

If you downloaded the models from Hugging Face, skip to step 3.

After filling out the consent form, the download should proceed to retrieve a tar archive file archive.tar.gz. Extract files from archive.tar.gz (this can take a few minutes):

tar -xf archive.tar.gz

This should produce a file containing model weights such as 2b-it-sfp.sbs and a tokenizer file (tokenizer.spm). You may want to move these files to a convenient directory location (e.g. the build/ directory in this repo).

Step 3: Build

The build system uses CMake. To build the gemma inference runtime, create a build directory and generate the build files using cmake from the top-level project directory. Note if you previous ran cmake and are re-running with a different setting, be sure to clean out the build/ directory with rm -rf build/* (warning this will delete any other files in the build/ directory.

For the 8-bit switched floating point weights (sfp), run cmake with no options:

Unix-like Platforms

cmake -B build

or if you downloaded bfloat16 weights (any model without -sfp in the name), instead of running cmake with no options as above, run cmake with WEIGHT_TYPE set to highway's hwy::bfloat16_t type (this will be simplified in the future, we recommend using -sfp weights instead of bfloat16 for faster inference):

cmake -B build -DWEIGHT_TYPE=hwy::bfloat16_t

After running whichever of the above cmake invocations that is appropriate for your weights, you can enter the build/ directory and run make to build the ./gemma executable:

# Configure `build` directory
cmake --preset make

# Build project using make
cmake --build --preset make -j [number of parallel threads to use]

Replace [number of parallel threads to use] with a number - the number of cores available on your system is a reasonable heuristic. For example, make -j4 gemma will build using 4 threads. If the nproc command is available, you can use make -j$(nproc) gemma as a reasonable default for the number of threads.

If you aren't sure of the right value for the -j flag, you can simply run make gemma instead and it should still build the ./gemma executable.

Note

On Windows Subsystem for Linux (WSL) users should set the number of parallel threads to 1. Using a larger number may result in errors.

If the build is successful, you should now have a gemma executable in the build/ directory.

Windows

# Configure `build` directory
cmake --preset windows

# Build project using Visual Studio Build Tools
cmake --build --preset windows -j [number of parallel threads to use]

If the build is successful, you should now have a gemma.exe executable in the build/ directory.

Bazel

bazel build -c opt --cxxopt=-std=c++20 :gemma

If the build is successful, you should now have a gemma executable in the bazel-bin/ directory.

Step 4: Run

You can now run gemma from inside the build/ directory.

gemma has the following required arguments:

Argument Description Example value
--model The model type. 2b-it, 2b-pt, 7b-it, 7b-pt, ... (see above)
--compressed_weights The compressed weights file. 2b-it-sfp.sbs, ... (see above)
--tokenizer The tokenizer file. tokenizer.spm

gemma is invoked as:

./gemma \
--tokenizer [tokenizer file] \
--compressed_weights [compressed weights file] \
--model [2b-it or 2b-pt or 7b-it or 7b-pt or ...]

Example invocation for the following configuration:

  • Compressed weights file 2b-it-sfp.sbs (2B instruction-tuned model, 8-bit switched floating point).
  • Tokenizer file tokenizer.spm.
./gemma \
--tokenizer tokenizer.spm \
--compressed_weights 2b-it-sfp.sbs \
--model 2b-it

RecurrentGemma

This repository includes a version of Gemma based on Griffin (paper, code). Its architecture includes both recurrent layers and local attention, thus it is more efficient for longer sequences and has a smaller memory footprint than standard Gemma. We here provide a C++ implementation of this model based on the paper.

To use the recurrent version of Gemma included in this repository, build the gemma binary as noted above in Step 3. Download the compressed weights and tokenizer from Kaggle as in Step 1, and run the binary as follows:

./gemma --tokenizer tokenizer.spm --model gr2b-it --compressed_weights 2b-it-sfp.sbs

Troubleshooting and FAQs

Running ./gemma fails with "Failed to read cache gating_ein_0 (error 294) ..."

The most common problem is that cmake was built with the wrong weight type and gemma is attempting to load bfloat16 weights (2b-it, 2b-pt, 7b-it, 7b-pt) using the default switched floating point (sfp) or vice versa. Revisit step #3 and check that the cmake command used to build gemma was correct for the weights that you downloaded.

In the future we will handle model format handling from compile time to runtime to simplify this.

Problems building in Windows / Visual Studio

Currently if you're using Windows, we recommend building in WSL (Windows Subsystem for Linux). We are exploring options to enable other build configurations, see issues for active discussion.

Model does not respond to instructions and produces strange output

A common issue is that you are using a pre-trained model, which is not instruction-tuned and thus does not respond to instructions. Make sure you are using an instruction-tuned model (2b-it-sfp, 2b-it, 7b-it-sfp, 7b-it) and not a pre-trained model (any model with a -pt suffix).

How do I convert my fine-tune to a .sbs compressed model file?

We're working on a python script to convert a standard model format to .sbs, and hope have it available in the next week or so. Follow this issue for updates.

What are some easy ways to make the model run faster?

  1. Make sure you are using the 8-bit switched floating point -sfp models.
  2. If you're on a laptop, make sure power mode is set to maximize performance and saving mode is off. For most laptops, the power saving modes get activated automatically if the computer is not plugged in.
  3. Close other unused cpu-intensive applications.
  4. On macs, anecdotally we observe a "warm-up" ramp-up in speed as performance cores get engaged.
  5. Experiment with the --num_threads argument value. Depending on the device, larger numbers don't always mean better performance.

We're also working on algorithmic and optimization approaches for faster inference, stay tuned.

Usage

gemma has different usage modes, controlled by the verbosity flag.

All usage modes are currently interactive, triggering text generation upon newline input.

Verbosity Usage mode Details
--verbosity 0 Minimal Only prints generation output. Suitable as a CLI tool.
--verbosity 1 Default Standard user-facing terminal UI.
--verbosity 2 Detailed Shows additional developer and debug info.

Interactive Terminal App

By default, verbosity is set to 1, bringing up a terminal-based interactive interface when gemma is invoked:

$ ./gemma [...]
  __ _  ___ _ __ ___  _ __ ___   __ _   ___ _ __  _ __
 / _` |/ _ \ '_ ` _ \| '_ ` _ \ / _` | / __| '_ \| '_ \
| (_| |  __/ | | | | | | | | | | (_| || (__| |_) | |_) |
 \__, |\___|_| |_| |_|_| |_| |_|\__,_(_)___| .__/| .__/
  __/ |                                    | |   | |
 |___/                                     |_|   |_|

tokenizer                     : tokenizer.spm
compressed_weights            : 2b-it-sfp.sbs
model                         : 2b-it
weights                       : [no path specified]
max_tokens                    : 3072
max_generated_tokens          : 2048

*Usage*
  Enter an instruction and press enter (%C reset conversation, %Q quits).

*Examples*
  - Write an email to grandma thanking her for the cookies.
  - What are some historical attractions to visit around Massachusetts?
  - Compute the nth fibonacci number in javascript.
  - Write a standup comedy bit about WebGPU programming.

> What are some outdoorsy places to visit around Boston?

[ Reading prompt ] .....................


**Boston Harbor and Islands:**

* **Boston Harbor Islands National and State Park:** Explore pristine beaches, wildlife, and maritime history.
* **Charles River Esplanade:** Enjoy scenic views of the harbor and city skyline.
* **Boston Harbor Cruise Company:** Take a relaxing harbor cruise and admire the city from a different perspective.
* **Seaport Village:** Visit a charming waterfront area with shops, restaurants, and a seaport museum.

**Forest and Nature:**

* **Forest Park:** Hike through a scenic forest with diverse wildlife.
* **Quabbin Reservoir:** Enjoy boating, fishing, and hiking in a scenic setting.
* **Mount Forest:** Explore a mountain with breathtaking views of the city and surrounding landscape.

...

Usage as a Command Line Tool

For using the gemma executable as a command line tool, it may be useful to create an alias for gemma.cpp with arguments fully specified:

alias gemma2b="~/gemma.cpp/build/gemma -- --tokenizer ~/gemma.cpp/build/tokenizer.spm --compressed_weights ~/gemma.cpp/build/2b-it-sfp.sbs --model 2b-it --verbosity 0"

Replace the above paths with your own paths to the model and tokenizer paths from the download.

Here is an example of prompting gemma with a truncated input file (using a gemma2b alias like defined above):

cat configs.h | tail -35 | tr '\n' ' ' | xargs -0 echo "What does this C++ code do: " | gemma2b

Note

CLI usage of gemma.cpp is experimental and should take context length limitations into account.

The output of the above command should look like:

$ cat configs.h | tail -35 | tr '\n' ' ' | xargs -0 echo "What does this C++ code do: " | gemma2b
[ Reading prompt ] ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
The code defines two C++ structs, `ConfigGemma7B` and `ConfigGemma2B`, which are used for configuring a deep learning model.

**ConfigGemma7B**:

* `kSeqLen`: Stores the length of the sequence to be processed. It's set to 7168.
* `kVocabSize`: Stores the size of the vocabulary, which is 256128.
* `kLayers`: Number of layers in the deep learning model. It's set to 28.
* `kModelDim`: Dimension of the model's internal representation. It's set to 3072.
* `kFFHiddenDim`: Dimension of the feedforward and recurrent layers' hidden representations. It's set to 16 * 3072 / 2.

**ConfigGemma2B**:

* `kSeqLen`: Stores the length of the sequence to be processed. It's also set to 7168.
* `kVocabSize`: Size of the vocabulary, which is 256128.
* `kLayers`: Number of layers in the deep learning model. It's set to 18.
* `kModelDim`: Dimension of the model's internal representation. It's set to 2048.
* `kFFHiddenDim`: Dimension of the feedforward and recurrent layers' hidden representations. It's set to 16 * 2048 / 2.

These structs are used to configure a deep learning model with specific parameters for either Gemma7B or Gemma2B architecture.

Incorporating gemma.cpp as a Library in your Project

The easiest way to incorporate gemma.cpp in your own project is to pull in gemma.cpp and dependencies using FetchContent. You can add the following to your CMakeLists.txt:

include(FetchContent)

FetchContent_Declare(sentencepiece GIT_REPOSITORY https://github.com/google/sentencepiece GIT_TAG 53de76561cfc149d3c01037f0595669ad32a5e7c)
FetchContent_MakeAvailable(sentencepiece)

FetchContent_Declare(gemma GIT_REPOSITORY https://github.com/google/gemma.cpp GIT_TAG origin/main)
FetchContent_MakeAvailable(gemma)

FetchContent_Declare(highway GIT_REPOSITORY https://github.com/google/highway.git GIT_TAG da250571a45826b21eebbddc1e50d0c1137dee5f)
FetchContent_MakeAvailable(highway)

Note for the gemma.cpp GIT_TAG, you may replace origin/main for a specific commit hash if you would like to pin the library version.

After your executable is defined (substitute your executable name for [Executable Name] below):

target_link_libraries([Executable Name] libgemma hwy hwy_contrib sentencepiece)
FetchContent_GetProperties(gemma)
FetchContent_GetProperties(sentencepiece)
target_include_directories([Executable Name] PRIVATE ${gemma_SOURCE_DIR})
target_include_directories([Executable Name] PRIVATE ${sentencepiece_SOURCE_DIR})

Building gemma.cpp as a Library

gemma.cpp can also be used as a library dependency in your own project. The shared library artifact can be built by modifying the make invocation to build the libgemma target instead of gemma.

Note

If you are using gemma.cpp in your own project with the FetchContent steps in the previous section, building the library is done automatically by cmake and this section can be skipped.

First, run cmake:

cmake -B build

Then, run make with the libgemma target:

cd build
make -j [number of parallel threads to use] libgemma

If this is successful, you should now have a libgemma library file in the build/ directory. On Unix platforms, the filename is libgemma.a.

Independent Projects Using gemma.cpp

Some independent projects using gemma.cpp:

If you would like to have your project included, feel free to get in touch or submit a PR with a README.md edit.

Acknowledgements and Contacts

gemma.cpp was started in fall 2023 by Austin Huang and Jan Wassenberg, and subsequently released February 2024 thanks to contributions from Phil Culliton, Paul Chang, and Dan Zheng.

Griffin support was implemented in April 2024 thanks to contributions by Andrey Mikhaylov, Eugene Kliuchnikov, Jan Wassenberg, Jyrki Alakuijala, Lode Vandevenne, Luca Versari, Martin Bruse, Phil Culliton, Sami Boukortt, Thomas Fischbacher and Zoltan Szabadka.

This is not an officially supported Google product.

gemma.cpp's People

Contributors

austinvhuang avatar dan-zheng avatar dcoles avatar eltociear avatar enum-class avatar ericye16 avatar jan-wassenberg avatar kewde avatar kishida avatar linkiwi avatar osanseviero avatar pchx avatar pculliton avatar shirayu avatar szabadka avatar traversaro avatar ufownl avatar veluca93 avatar villesundell avatar zeerd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gemma.cpp's Issues

Unable to "make" gemma binary executable on Windows

Followed the instructions given in README.md, I was able to get through all of the commands until the "make" command. It will just spit out the error "make: *** No rule to make target 'gemma'. Stop." in the build directory. Tried doing it in the main directory ( the one above build ) and it gives out a different error "g++ gemma.cc -o gemma process_begin: CreateProcess(NULL, g++ gemma.cc -o gemma, ...) failed. process_begin: CreateProcess(NULL, g++ gemma.cc -o gemma, ...) failed. make (e=2): The system cannot find the file specified. make: *** [: gemma] Error 2"

Any solution to this?

Error at the end when compiling

When compiling with make -j4 gemma, I get the following error.

[100%] Linking CXX executable gemma
CMakeFiles/gemma.dir/gemma.cc.o: In function `std::filesystem::__cxx11::path::path<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::filesystem::__cxx11::path>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::filesystem::__cxx11::path::format)':
gemma.cc:(.text._ZNSt10filesystem7__cxx114pathC2INSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_EERKT_NS1_6formatE[_ZNSt10filesystem7__cxx114pathC5INSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_EERKT_NS1_6formatE]+0x74): undefined reference to `std::filesystem::__cxx11::path::_M_split_cmpts()'
CMakeFiles/gemma.dir/gemma.cc.o: In function `std::unique_ptr<unsigned char [], hwy::AlignedFreer> gcpp::N_AVX3_ZEN4::GetCompressedWeights<gcpp::ConfigGemma7B>(gcpp::Path const&, gcpp::Path const&, hwy::ThreadPool&)':
gemma.cc:(.text._ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE[_ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE]+0x32): undefined reference to `std::filesystem::status(std::filesystem::__cxx11::path const&)'
gemma.cc:(.text._ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE[_ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE]+0x5a): undefined reference to `std::filesystem::status(std::filesystem::__cxx11::path const&)'
CMakeFiles/gemma.dir/gemma.cc.o: In function `std::unique_ptr<unsigned char [], hwy::AlignedFreer> gcpp::N_AVX3_ZEN4::GetCompressedWeights<gcpp::ConfigGemma2B>(gcpp::Path const&, gcpp::Path const&, hwy::ThreadPool&)':
gemma.cc:(.text._ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma2BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE[_ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma2BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE]+0x32): undefined reference to `std::filesystem::status(std::filesystem::__cxx11::path const&)'
gemma.cc:(.text._ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma2BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE[_ZN4gcpp11N_AVX3_ZEN420GetCompressedWeightsINS_13ConfigGemma2BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE]+0x5a): undefined reference to `std::filesystem::status(std::filesystem::__cxx11::path const&)'
CMakeFiles/gemma.dir/gemma.cc.o: In function `std::unique_ptr<unsigned char [], hwy::AlignedFreer> gcpp::N_SSSE3::GetCompressedWeights<gcpp::ConfigGemma7B>(gcpp::Path const&, gcpp::Path const&, hwy::ThreadPool&)':
gemma.cc:(.text._ZN4gcpp7N_SSSE320GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE[_ZN4gcpp7N_SSSE320GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE]+0x32): undefined reference to `std::filesystem::status(std::filesystem::__cxx11::path const&)'
CMakeFiles/gemma.dir/gemma.cc.o:gemma.cc:(.text._ZN4gcpp7N_SSSE320GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE[_ZN4gcpp7N_SSSE320GetCompressedWeightsINS_13ConfigGemma7BEEESt10unique_ptrIA_hN3hwy12AlignedFreerEERKNS_4PathESA_RNS5_10ThreadPoolE]+0x5a): more undefined references to `std::filesystem::status(std::filesystem::__cxx11::path const&)' follow
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/gemma.dir/build.make:133: gemma] Error 1
make[2]: *** [CMakeFiles/Makefile2:377: CMakeFiles/gemma.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:384: CMakeFiles/gemma.dir/rule] Error 2
make: *** [Makefile:189: gemma] Error 2

I am not sure how to fix this, I tried adding -lsdtdc++fs in CMakeCache.txt but it does not seem to have any effect, where should it be added?

Thanks in advance.

Compilation problem on WSL2

Compiling this morning (after apt-get cmake & clang):

[venv:ml] (git:main) $ cmake -B build -- Configuring done -- Generating done -- Build files have been written to: /home/fabiogr/gemma/gemma.cpp/build/_deps/highway-build/googletest-download [ 11%] Performing update step for 'googletest' [ 22%] No patch step for 'googletest' [ 33%] No configure step for 'googletest' [ 44%] No build step for 'googletest' [ 55%] No install step for 'googletest' [ 66%] No test step for 'googletest' [ 77%] Completed 'googletest' [100%] Built target googletest -- VERSION: 0.2.0 -- Not Found TCMalloc: TCMALLOC_LIB-NOTFOUND -- Configuring done -- Generating done -- Build files have been written to: /home/fabiogr/gemma/gemma.cpp/build [venv:ml] (git:main) $ make gemma g++ gemma.cc -o gemma gemma.cc:22:10: fatal error: hwy/foreach_target.h: No such file or directory 22 | #include "hwy/foreach_target.h" // IWYU pragma: keep | ^~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make: *** [<builtin>: gemma] Error 1 [venv:ml] (git:main) $
Seems hwy folder is not being created on the first step.

Error during build process

I'm building this on Windows 10.
I try to build as instructed in the readme, but the following error occurs on this command
cmake --build --preset make -j 8

CMake Error: Could not read presets from ~/clone/gemma.cpp/build:
File not found: ~/clone/gemma.cpp/build/CMakePresets.json

How to make gemma support Chinese?

how to make gemma support Chinese?
I just use gemma on ollama, and find that it only support english.
it's difficult for Chinese to use gemma.
could I do some thing to make it support chinese.
I only have a rtx4090, 64GB RAM and i9 cpu.

WSL support: "Killed", once executed on WSL

WSL subsystem
anything seems normal, once executed, only "Killed" print-out.

$  ./gemma  --tokenizer /home/home/F/GPT/Gemma/2b-it-cpp/tokenizer.spm \                                                     > --compressed_weights /home/home/F/GPT/Gemma/2b-it-cpp/2b-it.sbs \                                                                > --model 2b-it 
$  Killed

ENV:
clang-10.0
ubuntu18.04 && ubuntu20.04
cmake

project does not work as submodule

steps to reproduce issue:

take a look at my minimal example: https://github.com/bachittle/test-gemma-submodule . I add gemma.cpp as a subdirectory and compile it. The main executable is now in the build folder alongside libgemma. But when I try to run it, it does not go past the "reading prompt" stage. This is unusual, as even that executable should behave the same as if compiled directly. So I am unsure of what the issue is. This also shows up in my own repository when initializing and calling the GenerateGemma function, so that may be where the issue arises.

Incorporating gemma.cpp as a Library in Android Project

I've generated Native C++ project on Android Studio in Windows to use gemma.cpp as library.
and fill CMakeLists.txt as below.

cmake_minimum_required(VERSION 3.22.1)

project("gemmacpp")

add_library(${CMAKE_PROJECT_NAME} SHARED
        # List C/C++ source files with relative paths to this CMakeLists.txt.
        native-lib.cpp)

target_link_libraries(${CMAKE_PROJECT_NAME}
        # List libraries link to the target library
        android
        log)

include(FetchContent)

FetchContent_Declare(sentencepiece GIT_REPOSITORY https://github.com/google/sentencepiece GIT_TAG  origin/master)
FetchContent_MakeAvailable(sentencepiece)

FetchContent_Declare(gemma GIT_REPOSITORY https://github.com/google/gemma.cpp GIT_TAG origin/main)
FetchContent_MakeAvailable(gemma)

FetchContent_Declare(highway GIT_REPOSITORY https://github.com/google/highway.git GIT_TAG da250571a45826b21eebbddc1e50d0c1137dee5f)
FetchContent_MakeAvailable(highway)

target_link_libraries(${CMAKE_PROJECT_NAME} libgemma hwy hwy_contrib sentencepiece)
FetchContent_GetProperties(gemma)
FetchContent_GetProperties(sentencepiece)
target_include_directories(${CMAKE_PROJECT_NAME} PRIVATE ${gemma_SOURCE_DIR})
target_include_directories(${CMAKE_PROJECT_NAME} PRIVATE ${sentencepiece_SOURCE_DIR})

And there are lots of errors like below

[1/59] Linking CXX executable _deps\highway-build\tests\aligned_allocator_test
FAILED: _deps/highway-build/tests/aligned_allocator_test _deps/highway-build/aligned_allocator_test[1]_tests.cmake C:/Users/urimk/AndroidStudioProjects/GemmaApp/app/.cxx/Debug/6p2d195k/arm64-v8a/_deps/highway-build/aligned_allocator_test[1]_tests.cmake
cmd.exe /C "cd . && C:\Users\urimk\AppData\Local\Android\Sdk\ndk\25.1.8937393\toolchains\llvm\prebuilt\windows-x86_64\bin\clang++.exe --target=aarch64-none-linux-android30 --sysroot=C:/Users/urimk/AppData/Local/Android/Sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/windows-x86_64/sysroot -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -std=c++17 -fno-limit-debug-info -static-libstdc++ -Wl,--build-id=sha1 -Wl,--fatal-warnings -Wl,--gc-sections -Wl,--no-undefined -Qunused-arguments -Wl,--gc-sections _deps/highway-build/CMakeFiles/aligned_allocator_test.dir/hwy/aligned_allocator_test.cc.o -o _deps\highway-build\tests\aligned_allocator_test _deps/highway-build/libhwy.a _deps/highway-build/libhwy_test.a _deps/highway-build/libhwy_contrib.a lib/libgtest.a lib/libgtest_main.a _deps/highway-build/libhwy.a lib/libgtest.a -pthread -latomic -lm && cmd.exe /C "cd /D C:\Users\urimk\AndroidStudioProjects\GemmaApp\app.cxx\Debug\6p2d195k\arm64-v8a_deps\highway-build && C:\Users\urimk\AppData\Local\Android\Sdk\cmake\3.22.1\bin\cmake.exe -D TEST_TARGET=aligned_allocator_test -D TEST_EXECUTABLE=C:/Users/urimk/AndroidStudioProjects/GemmaApp/app/.cxx/Debug/6p2d195k/arm64-v8a/_deps/highway-build/tests/aligned_allocator_test -D TEST_EXECUTOR= -D TEST_WORKING_DIR=C:/Users/urimk/AndroidStudioProjects/GemmaApp/app/.cxx/Debug/6p2d195k/arm64-v8a/_deps/highway-build -D TEST_EXTRA_ARGS= -D TEST_PROPERTIES= -D TEST_PREFIX= -D TEST_SUFFIX= -D TEST_FILTER= -D NO_PRETTY_TYPES=FALSE -D NO_PRETTY_VALUES=FALSE -D TEST_LIST=aligned_allocator_test_TESTS -D CTEST_FILE=C:/Users/urimk/AndroidStudioProjects/GemmaApp/app/.cxx/Debug/6p2d195k/arm64-v8a/_deps/highway-build/aligned_allocator_test[1]_tests.cmake -D TEST_DISCOVERY_TIMEOUT=60 -D TEST_XML_OUTPUT_DIR= -P C:/Users/urimk/AppData/Local/Android/Sdk/cmake/3.22.1/share/cmake-3.22/Modules/GoogleTestAddTests.cmake""
CMake Error at C:/Users/urimk/AppData/Local/Android/Sdk/cmake/3.22.1/share/cmake-3.22/Modules/GoogleTestAddTests.cmake:83 (message):
Error running test executable.

  Path: 'C:/Users/urimk/AndroidStudioProjects/GemmaApp/app/.cxx/Debug/6p2d195k/arm64-v8a/_deps/highway-build/tests/aligned_allocator_test'
  Result: %1 is not a valid Win32 application.
  Output:

Is there anyone who can provide a solution or a successful example?

Failed to read from model.weights.h5 - might be a directory, or too small?

Hi,

I am experiencing the follow issue, I tried the following versions:
https://www.kaggle.com/models/keras/gemma/frameworks/Keras/variations/gemma_2b_en/versions/1
https://www.kaggle.com/models/keras/gemma/frameworks/Keras/variations/gemma_2b_en/versions/2

/gemma \                                         
--tokenizer vocabulary.spm \
--weights model.weights.h5 \
--compressed_weights 2b-pt-sfp.sbs  --model 2b-pt --verbosity 2
Cached compressed weights does not exist yet (code 256), compressing weights and creating file: 2b-pt-sfp.sbs.
Abort at /Users/charbel/Downloads/gemma/gemma.cpp/./gemma.cc:138: Failed to read from model.weights.h5 - might be a directory, or too small?
zsh: abort      ./gemma --tokenizer vocabulary.spm --weights model.weights.h5  2b-pt-sfp.sbs 

Any ideas how to resolve?

Cheers,
Charbel

GPU Support

Is there a way to use GPU or is this on the roadmap.

Compile error on visual studio build, array size exceed

I performed the build like this:
cmake --build --preset make -j 1

Then the following compilation error occurred:
error C2148:
Total size of array must not exceed 0x7fffffff bytes.

After looking into it, it seems that it is impossible to load more than 2GB in Windows.

2b-pt model produces outputs with less meaningful content

As the title says, I am running the 2b-pt model.
Here is the output:

> Hello who are you

[ Reading prompt ] .... you

are a very nice person and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and I hope you have a nice day and

Wrong keyboard mapping in command line

System: macOS 14.3.1 on Macbook Air M2
Terminal: reproduced in iTerm as well as native Terminal
Command:

./gemma \
--tokenizer tokenizer.spm \
--compressed_weights 2b-it-sfp.sbs \
--model 2b-it

When I try to navigate in the chat input using the arrow keys, it doesn't work and I get codes like ^[[D for the left arrow key.
Also, I cannot add new lines in my prompt using Shift+Enter, and when I insert text from the clipboard containing new lines, I get multiple invocations.

AVX2 is dimwitted compared to AVX512

On a $10,000 AMD Ryzen 7995WX (znver4 avx512) Gemma 7b instruct sfp is able to solve mathematical riddles.

image

But on a $600 Intel i9-14900K (raptorlake avx2) the same Gemma model gives the fool's answer.

image

I expected both machines to produce an identical response since I set the temperature to zero. However the behavior of gemma.cpp appears to differ in a pernicious way depending on the ISA. It'd be great if people without AVX512 privilege could experience the same level of impressive brilliance from Gemma that I'm seeing on my Threadripper.

Functions exposed in the libgemma.a?

Forgive my weak C++-fu. I've compiled gemma into libgemma.a to call from my C++ application. Is there documentation that details the function calls available in the library?

The continuity of the conversation is problematic.

Issue

The second question lost its relevance to the first question.

Deatils

For the Early Codes with fb6f266


  __ _  ___ _ __ ___  _ __ ___   __ _   ___ _ __  _ __
 / _` |/ _ \ '_ ` _ \| '_ ` _ \ / _` | / __| '_ \| '_ \
| (_| |  __/ | | | | | | | | | | (_| || (__| |_) | |_) |
 \__, |\___|_| |_| |_|_| |_| |_|\__,_(_)___| .__/| .__/
  __/ |                                    | |   | |
 |___/                                     |_|   |_|

tokenizer                     : /.../tokenizer.spm
compressed_weights            : /.../2b-it.sbs
model                         : 2b-it
weights                       : [no path specified]
max_tokens                    : 3072
max_generated_tokens          : 2048

*Usage*
  Enter an instruction and press enter (%Q quits).

*Examples*
  - Write an email to grandma thanking her for the cookies.
  - What are some historical attractions to visit around Massachusetts?
  - Compute the nth fibonacci number in javascript.
  - Write a standup comedy bit about GPU programming.

> Write a poem with starfield theme.

[ Reading prompt ] ................


Stars ignite the velvet night,
A symphony of light and might.
Whispers of wonder, stories untold,
Echoing through the cosmic fold.

Stars ignite the velvet night,
A symphony of light and might.
Dancing in the cosmic dance,
A celestial ballet, a wondrous trance.

Stars ignite the velvet night,
A symphony of light and might.
Guiding lost souls through the starry sea,
A beacon of hope, a guiding key.

Stars ignite the velvet night,
A symphony of light and might.
A tapestry of wonder, a cosmic sight,
A symphony of light, a starry light.

> 翻译成中文。

[ Reading prompt ] .............


星辰闪烁, velvet 的夜幕,
闪烁的旋律,令人惊叹。
星光在宇宙中闪烁,
闪烁的奇迹,令人醉心。

星辰闪烁, velvet 的夜幕,
闪烁的旋律,令人惊叹。
在宇宙的舞剧中闪烁,
闪烁的奇迹,令人醉心。

星辰闪烁, velvet 的夜幕,
闪烁的旋律,令人惊叹。
引领迷失的灵魂,
在星辰的闪烁中找到希望。

For the Corrent Codes with b6aaf6b

  __ _  ___ _ __ ___  _ __ ___   __ _   ___ _ __  _ __
 / _` |/ _ \ '_ ` _ \| '_ ` _ \ / _` | / __| '_ \| '_ \
| (_| |  __/ | | | | | | | | | | (_| || (__| |_) | |_) |
 \__, |\___|_| |_| |_|_| |_| |_|\__,_(_)___| .__/| .__/
  __/ |                                    | |   | |
 |___/                                     |_|   |_|

tokenizer                     : /.../tokenizer.spm
compressed_weights            : /.../2b-it.sbs
model                         : 2b-it
weights                       : [no path specified]
max_tokens                    : 3072
max_generated_tokens          : 2048
multiturn                     : 0

*Usage*
  Enter an instruction and press enter (%C resets conversation, %Q quits).
  Since multiturn is set to 0, conversation will automatically reset every turn.

*Examples*
  - Write an email to grandma thanking her for the cookies.
  - What are some historical attractions to visit around Massachusetts?
  - Compute the nth fibonacci number in javascript.
  - Write a standup comedy bit about GPU programming.

> Write a poem with starfield theme.

[ Reading prompt ] ................


Stars ignite the velvet night,
A symphony of light and might.
Whispers of wonder, stories untold,
Echoing through the cosmic fold.

Stars ignite the velvet night,
A symphony of light and might.
Dancing in the cosmic dance,
A celestial ballet, a wondrous trance.

Stars ignite the velvet night,
A symphony of light and might.
Guiding lost souls through the starry sea,
A beacon of hope, a guiding key.

Stars ignite the velvet night,
A symphony of light and might.
A tapestry of wonder, a cosmic sight,
A symphony of light, a starry light.

> 翻译成中文。

[ Reading prompt ] ............


我需要您提供要翻译的文本,以便我为您翻译成中文。请提供文本,我会尽力为您翻译成中文。

> 


Gemma.cpp on the Android arm64-v8a

I have made Gemma executable to run on Android arm64-v8a with below option.

cmake -DCMAKE_TOOLCHAIN_FILE=/usr/lib/android-sdk/ndk/25.1.8937393/build/cmake/android.toolchain.cmake .

And it runs on my Android.
However there is error for reading weights.

/# gemma --tokenizer tokenizer.spm --compressed_weights 2b-it-sfp.sbs --model 2b-it
cache.path.c_str() : 2b-it-sfp.sbs
2b-it-sfp.sbs:3163184640
BlobReader::Open open
BlobReader::Open Read
BlobReader::Open Allocate
3163184640, 0
Cached compressed weights does not exist yet (code 155), compressing weights and creating file: 2b-it-sfp.sbs.

There is a failure in the "BlobReader::Open" function, on the line "blob_store_->CheckValidity(IO::FileSize(filename))".
The expected value of "IO::FileSize(filename)" is 3163184640, but it is currently returning 0.

However, before the line "hwy::CopySameSize(&bs, blob_store_.get())" in the "BlobReader::Open" function, the "IO::FileSize(filename)" is functioning correctly.

Anyway,
The comments in #21, it should work well.

Could you help me to figure out what I missed?

Make Copybara close pull requests upon merge, use more informative commit descriptions

  1. Copybara does not merge or close pull requests upon importing commits to dev branch. This makes it unclear when pull requests have truly been "merged" and requires us to manually close PRs.
    • It would be great if Copybara automatically closed pull requests upon merge instead.
  2. Copybara-exported commits start with Copybara import of the project: in the first line, which provides no information.
    • It would be good to show pull request descriptions in the first line instead.

cmake failed

windows11
$
cmake -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang++.exe" -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang.exe" -S ..
-- Building for: Ninja
-- The C compiler identification is Clang 16.0.4 with GNU-like command-line
-- The CXX compiler identification is Clang 16.0.4 with GNU-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/LLVM/bin/clang.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/LLVM/bin/clang++.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Deprecation Warning at build/_deps/highway-src/CMakeLists.txt:25 (cmake_policy):
The OLD behavior for policy CMP0111 will be removed from a future version
of CMake.

The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.

-- Performing Test ATOMICS_LOCK_FREE_INSTRUCTIONS
-- Performing Test ATOMICS_LOCK_FREE_INSTRUCTIONS - Success
-- Performing Test HWY_EMSCRIPTEN
-- Performing Test HWY_EMSCRIPTEN - Failed
-- Performing Test HWY_RISCV
-- Performing Test HWY_RISCV - Failed
-- Looking for sys/auxv.h
-- Looking for sys/auxv.h - not found
-- Looking for asm/hwcap.h
-- Looking for asm/hwcap.h - not found
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: D:/gemma2/gemma.cpp/build/_deps/highway-build/googletest-download
[1/9] Creating directories for 'googletest'
[2/9] Performing download step (git clone) for 'googletest'
Cloning into 'googletest-src'...
HEAD is now at 43efa0a4 Merge pull request #3617 from Bagira80:fix_3616
[3/9] Performing update step for 'googletest'
[4/9] No patch step for 'googletest'
[5/9] No configure step for 'googletest'
[6/9] No build step for 'googletest'
[7/9] No install step for 'googletest'
[8/9] No test step for 'googletest'
[9/9] Completed 'googletest'
-- Found Python: D:/msys64/mingw64/bin/python3.11.exe (found version "3.11.8") found components: Interpreter
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - no
-- Found Threads: TRUE
CMake Deprecation Warning at build/_deps/sentencepiece-src/CMakeLists.txt:15 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- VERSION: 0.2.0
-- Not Found TCMalloc: TCMALLOC_LIB-NOTFOUND
-- Configuring done (105.8s)
-- Generating done (0.1s)
-- Build files have been written to: D:/gemma2/gemma.cpp/build

the 7b-pt-sfp doesn‘t looks like normal. some thing wrong with the args?

use the build gemma.cpp to run 2b, it's ok . but 7b seems abnoraml. something wrong?

./gemma --tokenizer tokenizer.spm --compressed_weights 7b-pt-sfp.sbs --model 7b-it


/ |/ _ \ '_ _ | ' _ \ / _ | / | ' | '
| (| | __/ | | | | | | | | | | (| || (| |) | |) |
__, |_
|| || ||| || ||_,()| ./| ._/
/ | | | | |
|
/ || ||

tokenizer : tokenizer.spm
compressed_weights : 7b-pt-sfp.sbs
model : 7b-it
weights : [no path specified]
max_tokens : 3072
max_generated_tokens : 2048

Usage
Enter an instruction and press enter (%Q quits).

Examples

  • Write an email to grandma thanking her for the cookies.
  • What are some historical attractions to visit around Massachusetts?
  • Compute the nth fibonacci number in javascript.
  • Write a standup comedy bit about GPU programming.

Compute the nth fibonacci number in javascript.

[ Reading prompt ] ................

The Fibonacci sequence is a sequence of numbers in which each number is the sum of the two preceding numbers. The sequence begins: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17701, 28647, 46348, 75000, 121348, 196348, 317696, 514044, 831740, 1345784, 2177524, 3523308, 5700832, 9224140, 14924972, 24149112, 39074084, 63223196, 102297280, 165520476, 267817756, 433338232, 701199988, 1134538220, 1835738208, 2970976428, 4806714636, 7777690864, 12584405498, 20361896362, 32943361860, 53305258222, 86245619082, 139550876304, 225796495386, 365347371690, 591240867076, 956588244766, 1547829111842, 2498417956608, 4046247068450, 654072465058, 1058727133508, 1712799600568, 2771526734076, 4484326334644, 7255853068720, 11740179403364, 18996032470964, 30736211874328, 49732244345292, 79468456219616, 129200699564912, 208669155784528, 337870855349440,

./gemma --tokenizer tokenizer.spm --compressed_weights 7b-pt-sfp.sbs --model 7b-it


/ |/ _ \ '_ _ | ' _ \ / _ | / | ' | '
| (| | __/ | | | | | | | | | | (| || (| |) | |) |
__, |_
|| || ||| || ||_,()| ./| ._/
/ | | | | |
|
/ || ||

tokenizer : tokenizer.spm
compressed_weights : 7b-pt-sfp.sbs
model : 7b-it
weights : [no path specified]
max_tokens : 3072
max_generated_tokens : 2048

Usage
Enter an instruction and press enter (%Q quits).

Examples

  • Write an email to grandma thanking her for the cookies.
  • What are some historical attractions to visit around Massachusetts?
  • Compute the nth fibonacci number in javascript.
  • Write a standup comedy bit about GPU programming.

write a quick sort algorithm.

[ Reading prompt ] ...............

/*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*

gemma.exe does not respond no matter how long you wait.

I built and ran gemma with Windows visual studio.

I ran gemma.exe with the following command line.
--tokenizer tokenizer.spm --compressed_weights 2b-it-sfp.sbs --model 2b-it

But gemma.exe does not respond no matter how long you wait.

gemma error

Can you tell me what kind of issue this is?

Support Bazel build

Would be nice if Bazel build would also be supported by this project by providing WORKSPACE/MODLUE/BUILD files

request to remove `cmake`, `highway`, and `gtest`

the current project dependencies, cmake, highway, and gtest, are impacting the ease of initial setup. they increase the overhead for new users and contributors. by considering streamlining the project to reduce the reliance, we can enhance the project's accessibility and maintainability.

make error on orangepi 5 (arm)

I'm facing an issue when compiling on orangepi, Debian BookWorm.
I installed first some dependencies :
sudo apt install libhighwayhash-dev libhighwayhash0

But I have such errors ( after command "make gemma", or "make -j4 gemma") :

/home/orangepi/gemma/gemma.cpp/./ops.h:210:13: error: ‘Mul’ was not declared in this scope
210 | return Mul(v, cdf);
| ~~~^~~~~~~~

OR :

/home/orangepi/gemma/gemma.cpp/./ops.h:580:40: error: capture by copy of SVE type ‘V’ {aka ‘__SVFloat32_t’}
580 | hn::Transform(d, x, mask_pos, [&sum, max](D d, V v) {

make --version -> GNU Make 4.3 for aarch64-unknown-linux-gnu

I also tried with cmake, with no luck.
Thanks.

GRPC support - in scope?

I'd like to be able to run gemma.cpp on kubernetes. A first step in my rough plan is to add a client/server mode, and I thought I would add GRPC support. Is the project open to having a contrib directory where we can collaborate on this sort of thing? In future, I'm imagining we could put things like kubernetes manifests in there also.

I have a simple server (though my C++ is not good!) and an example client in golang which I will send as a WIP PR to make the discussion more concrete.

Building android executables on ubuntu

After executing the command "cmake -DCMAKE_TOOLCHAIN_FILE=/usr/lib/android-sdk/ndk/25.1.8937393/build/cmake/android.toolchain.cmake -B build- android" and after it succeeded I ran "make -j 4 gemma" and got the following error

image

How do I fix it?

num_threads doesn't seem to have any effect

./gemma
--tokenizer tokenizer.spm
--compressed_weights 2b-it-sfp.sbs
--model 2b-it
--verbosity 2
--num_threads 2

Increasing num_threads like this doesn't improve speed. Is this expected?

[Suggestions] Low effort OpenMP, OpenACC, CBLAS compatible CPU & GPU acceleration + other improvements

Acceleration

There may be a possibility to add support of multiple rudimentary acceleration methods to this project without much effort.

Please refer to run.c and its Makefile in my fork of llama.c

CBLAS:

https://github.com/trholding/llama2.c/blob/e8698eb31b26bd2f2922a2b48ef8a4b2fa8ad1a1/run.c#L86

If CBLAS support is implemented, then GPU acceleration via OpenCL through CLBlast library is just a drop in.

OpenMP & OpenACC:

https://github.com/trholding/llama2.c/blob/e8698eb31b26bd2f2922a2b48ef8a4b2fa8ad1a1/run.c#L110

Note, annotate hot parts for parallelism.

Other improvements:

Mozilla's llamafile like usability:

We invented the concept way before mozilla did it. We implemented embedded models and multi os binaries.

https://github.com/trholding/llama2.c/blob/e8698eb31b26bd2f2922a2b48ef8a4b2fa8ad1a1/run.c#L35

To make "build once run on any os" multi os binaries, build with cosmopolitan libc toolchain. Refer the Makefile but follow Cosmo docs as we use an older version.

I hope to build a Gemma 2 Everywhere OS demo similar to L2E OS. Is that naming by any chance disallowed by Google/Gemma copyrights?

Maybe missing some detail to run model 2b-it

Thank you team for the amazing work.
Actually, I have successfully run the 2b-it-sfp model.
I am trying to run the model 2b-it, but it does not work, here is my process:
First run:

./gemma --tokenizer model_2b_it/tokenizer.spm --compressed_weights model_2b_it/2b-it.sbs  --model 2b-it

And I got the error:

Failed to read cache gating_ein_0 (error 294)
Abort at /gemma.cpp/./gemma.cc:117: Failed to open model file  - does it exist?
Aborted (core dumped)

After that, I change the script to:

./gemma --tokenizer model_2b_it/tokenizer.spm --weights model_2b_it/2b-it.sbs --compressed_weights model_2b_it/compressed_model --model 2b-it

And got the error:

Cached compressed weights does not exist yet (code 256), compressing weights and creating file: model_2b_it/compressed_model.
Abort at /gemma.cpp/./gemma.cc:141: Failed to read from model_2b_it/2b-it.sbs - might be a directory, or too small?
Aborted (core dumped)

Hopefully it shows enough information about the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.