Code Monkey home page Code Monkey logo

mllm's Introduction

Website Documentation Actions Status

mllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices.

  • Plain C/C++ implementation without dependencies
  • Optimized for multimodal LLMs like fuyu-8B
  • Supported: ARM NEON and x86 AVX2
  • 4-bit and 6-bit integer quantization

Wait.. why on-device multimodal LLM? - It's a key building block for intelligent personal agent, text-based image searching/retrieval, screen VQA, and many more exciting mobile apps, without giving away your private data (chat history, screenshots, taken photos, etc).

Contents

Android Demo

Demo of LLM chatting Demo of image understanding Demo of UI screen understanding
285_1706016329.mp4
284_1706016328.mp4
269_1706014120.mp4

Support models

FP32 INT4
LLaMA-1/2 7B ✔️ ✔️
Alpaca 7B ✔️ ✔️
TinyLLaMA 1.1B ✔️ ✔️
Fuyu 8B ✔️ ✔️
Vision Transformer ✔️ ✔️
CLIP ✔️ ✔️
ImageBind (3 modalities) ✔️ ✔️
LLaVA 7B ✔️ ✔️
Gemma 2B ✔️ ✔️
Qwen 0.5B ✔️ ✔️
Mistral 7B ✔️ ✔️

Quick Start

Get the Code

git clone https://github.com/UbiquitousLearning/mllm
cd mllm

Check prerequisites

Building mllm requires following tools:

  • gcc(11.4+) / clang (11.0+)
  • CMake >= 3.18
  • Android NDK Toolchains >= 26

Try it on Android

Build

export ANDROID_NDK=/path/to/your/ndk
cd scripts
./build_android.sh

Run Fuyu-8B

Download the model from here, or using the following instructions

mkdir ../models && cd ../models
# Download fuyu-8b-q4_k.mllm
wget https://huggingface.co/mllmTeam/fuyu-8b-mllm/resolve/main/fuyu-8b-q4_k.mllm?download=true  -O fuyu-8b-q4_k.mllm

Run on an android phone with at least 12GB of memory.

cd ../script
./run_fuyu.sh

Result are as followed:

> ./demo_fuyu
[Q] [../assets/bus.png]Generate a coco-style caption.

[A]  A blue bus driving down▁the▁road next▁to a sidewalk.
[Q] [../assets/two_cats.jpg]What's this?

[A]  Two cats laying on▁top▁of a pink blanket.

Run LLaMA-2-7B

Download model from here, or using the following instructions

mkdir ../models && cd ../models
# Dowload llama-2-7b-chat-q4_k.mllm
wget https://huggingface.co/mllmTeam/llama-2-7b-mllm/resolve/main/llama-2-7b-chat-q4_k.mllm?download=true  -O llama-2-7b-chat-q4_k.mllm

Run on an android phone with at least 12GB of memory.

cd ../script
./run_llama.sh

Result are as followed:

> ./demo_llama
[Q]  Hello, who are you?
[A]

I am a bot, I'm here to help answer your questions and provide information. I'm just an AI, I don't have personal experiences or feelings like humans do, but I'm here to help you with any questions you may have. Is there something specific you would like to know or talk about?
[Q]  What can you do?
[A]
I can help with a wide range of tasks, including:
  
* Answering questions on various topics, such as science, history, technology, and more.
* Providing information on different subjects, such as culture, entertainment, and lifestyle.
* Helping with language-related tasks, such as translating text or improving grammar and syntax.
* Generating text, such as articles, stories, or even entire books.
* Assisting with coding
[Q]  Please introduce Beijing University of Posts and Telecommunications.
[A]
Beijing University of Posts and Telecommunications (BUPT) is a public research university located in Beijing, China. It was established in 1955 and is one of the top universities in China, with a strong focus on engineering, technology, and telecommunications.
BUPT offers a wide range of undergraduate and graduate programs in fields such as computer science, electrical engineering, software engineering, and communication engineering. The university

Run ImageBind

Download model from here, or using the following instructions

mkdir ../models && cd ../models
# Download imagebind_huge-q4_k.mllm
wget https://huggingface.co/mllmTeam/imagebind_huge-mllm/resolve/main/imagebind_huge-q4_k.mllm?download=true -O imagebind_huge-q4_k.mllm 

Run on an android phone with at least 4GB of memory.

cd ../script
./run_imagebind.sh

Result are as followed:

> ./demo_imagebind 
vision X text :
0.9985647 0.0013827 0.0000526 
0.0000365 0.9998636 0.0000999 
0.0000115 0.0083149 0.9916736 
vision X audio :
0.8054272 0.1228001 0.0717727 
0.0673458 0.8429284 0.0897258 
0.0021967 0.0015335 0.9962698 

Run for Linux

Build

cd scripts
./build.sh

Run Fuyu-8B

cd ./bin
./demo_fuyu -m ../models/fuyu-8b-q4_k.mllm -v ../vocab/fuyu_vocab.mllm

Run LLaMA-2-7B

cd ./bin
./demo_llama -m ../models/llama-2-7b-chat-q4_k.mllm -v ../vocab/llama_vocab.mllm

Run ImageBind

cd ./bin
./demo_imagebind -m ../models/imagebind_huge-q4_k.mllm -v ../vocab/clip_vocab.mllm

Customization

Convert models

You can download models from here, or you can convert a pytorch/safetensor model to mllm model by yourself.

cd tools/convertor
pip install -r ./requirements.txt

# for one file pytorch model
python convert.py --input_model=model.pth --output_model=model.mllm --type=torch

# for multi-file pytorch model
python convert.py --input_model=pytorch_model.bin.index.json --output_model=model.mllm --type=torch

# for one file safetensor model
python convert.py --input_model=model.bin --output_model=model.mllm --type=safetensor

# for multi-file safetensor model
python convert.py --input_model=model.safetensors.index.json --output_model=model.mllm --type=safetensor

Convert vocabulary

You can convert vocabulary to mllm vocabulary as followed.

cd tools/convertor
python vocab.py --input_file=tokenizer.json --output_file=vocab.mllm --type=Unigram

Quantize models

You can quantize mllm model to int4 model by yourself. mllm only support two quantize modes: Q4_0 and Q4_K.

cd bin
./quantize model.mllm model_q4_0.mllm Q4_K

Roadmap

  • More backends like QNN
  • More models like PandaGPT
  • More optimizations like LUT-GEMM
  • More..

Documentation

See the documentation here for more information

Contribution

Read the contribution before you contribute.

Acknowledgments

mllm reuses many low-level kernel implementation from ggml on ARM CPU. It also utilizes stb and wenet for pre-processing images and audios. mllm also has benefitted from following projects: llama.cpp and MNN.

License

Overall Project License

This project is licensed under the terms of the MIT License. Please see the LICENSE file in the root directory for the full text of the MIT License.

Apache 2.0 Licensed Components

Certain component(wenet) of this project is licensed under the Apache License 2.0. These component is clearly identified in their respective subdirectories along with a copy of the Apache License 2.0. For the full text of the Apache License 2.0, please refer to the LICENSE-APACHE file located in the relevant subdirectories.

mllm's People

Contributors

yirongjie avatar lx200916 avatar oreomaker avatar ruiqurm avatar chenghuawang avatar ubiquitouslearning avatar xumengwei avatar lasia98 avatar

Stargazers

Jaemin Noh avatar  avatar ChengJJ avatar  avatar Asklv avatar colorful avatar  avatar  avatar  avatar rich avatar Hades avatar Oliver Lopez  avatar  avatar Junhua Liu avatar  avatar Hasani avatar Noel Jacob avatar Chipp avatar  avatar  avatar RokiesHy@cv avatar  avatar Apinant.u avatar  avatar ys_cv avatar Teng Wang avatar Xiaobing Han avatar Bunnywaffle avatar Muntasir Ahmed avatar Derrick avatar dongho Han avatar Changmin Jeon avatar Dani Gunawan avatar  avatar  avatar  avatar  avatar Chandan avatar Yves Kalin avatar 22dimensions avatar Baalateja Kataru avatar  avatar Jeremy Song avatar Zhizhou Sha avatar  avatar Xu Deng avatar Tars avatar valeria_wong avatar  avatar E-Tiger Studio avatar  avatar tang avatar Tiensbakung avatar M avatar  avatar bbdd avatar  avatar Artem Pavlov avatar  avatar Hao Zhou avatar mutouyuguo avatar  avatar Adam Cohen Hillel avatar  avatar Yungho Tung avatar Tatako Felici avatar Ziang Wu avatar  avatar  avatar Yan Zhuang avatar SwirX avatar Marcos Scheeren avatar  avatar  avatar  avatar  avatar yjcho avatar huzongxiang avatar pedro rivas avatar Kenny Lim avatar  avatar Lu Yan avatar  avatar  avatar DonkeyDDDDD avatar Zheng.Deng avatar Wei Zhou avatar Bruno Georgevich Ferreira avatar  avatar Yash Jain avatar  avatar  avatar Hu Xiaolin avatar pipecat avatar Hugo_Liu avatar _Sun avatar  avatar Xiangyu Li avatar Minjie Chen avatar  avatar

Watchers

tang avatar  avatar  avatar  avatar ChengJJ avatar renguofeng avatar  avatar Weikai Xie avatar

mllm's Issues

Trying to add custom models but cannot run properly on Android

I am trying to add tinyllama chat to Android(Llama2 demo is a success) but failed. My operations are listed below:

  • Firstly, I add modeling_tinyllama.hpp in tools/jni and add some lines of code to LibHelper.cpp, also add marco TINY_LLAMA to PreDefinedModel in LibHelper.hpp

  • Then I add modeling_tinyllama.hpp to CMakeList.txt when building mllm_lib, run scripts/build_android_app.sh and got libmllm_lib.a in build-arm

  • In Android Studio(on Windows), replace libmllm_lib.a, and modify code in viewmodel:

val modelPath = when(modelType){
    0->"model/llama_2.mllm"
    1->"model/fuyu.mllm"
    2->"model/tinyllama.mllm"
    else->"model/llama"
}
val vacabPath = when(modelType){
    0->"model/vocab.mllm"
    1->"model/vocab_uni.mllm"
    2->"model/tinyllama_vocab.mllm"
    else->"model/vocab.mllm"
}
  • Finally change
// Chat(navController, it.arguments?.getInt("type") ?: 0)
Chat(navController, 2)

The result shows that the tinyllama can be successfully loaded but never output any char. Is there any mistake in my procedure, or wrong code in files? Code is attached. Thanks for your help.

improve examples

don't use fixed image path in demo, add a new parameter, like:

cmdParser.add<string>("image", 'i', "specify mllm image path", false, "../assets/cat.jpg");

Android app crash while Image Reading

Fail To Load Models! Please Check if models exists at /sdcard/Download/model and restart
but i have copied model in that location. still getting same error message what would be the reason.

CANNOT LINK EXECUTABLE "./demo_fuyu": library "libomp.so" not found: needed by main executable

Hi
As i followed the steps to build and run on Samsung s24 android device, facing below error

mllm/scripts$ ./run_fuyu.sh
../vocab/fuyu_vocab.mllm: 1 file pushed, 0 skipped. 34.1 MB/s (5854575 bytes in 0.164s)
../bin-arm/demo_fuyu: 1 file pushed, 0 skipped. 2.4 MB/s (35765448 bytes in 14.432s)
../models/fuyu-8b-q4_k.mllm: 1 file pushed, 0 skipped. 07.7 MB/s (5959714207 bytes in 8600.685s)
CANNOT LINK EXECUTABLE "./demo_fuyu": library "libomp.so" not found: needed by main executable

can you kindly help in identifying the root cause

how to add tinyllama to app

how to add tinyllama to app for testing.
in /tools/jni add a new file like modeling_llama.hpp?

can you update app and mllm to support tinyllama?

thank you very much

better cmake configue

在每个文件夹下创建CMakeLists,全局使用add_subdirectory引入子模块
使用target_include_directories引入头文件所在目录

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.