mybigday / llama.rn Goto Github PK
View Code? Open in Web Editor NEWReact Native binding of llama.cpp
License: MIT License
React Native binding of llama.cpp
License: MIT License
llama.cpp now supports parallel decoding in one context so we can support.
Breaking change: Deprecate stopCompletion
method and move to return values of completion.
Is it possible to add a text streaming feature? It looks like your loading a local cpp server I wonder does swift support sockets for react native? Inference is so slow right on mobile devices right now, streaming would help the user know something is happening. Interested in contributing if you need contribs. I believe it is supported by llama.cpp in langchains implementation but im not sure if that's custom
It just like OpenAI text completion playground, start with a text area as prompt.
We can also add area for use grammar.
As it says on the tin. Loading small 3b models ala Tiny Llama or StableLM models do not work. Tested models:
Attempting to call initLlama results in
Error: Failed to initialize context
Which I can only assume is here:
I do not know enough about native functions to investigate further.
In addition, stopCompletions() does not stop a completion on Android.
Thanks for your work, the project is fantastic otherwise.
can you add support to change it?
Not high priority.
It probably easy to support because we implemented the context in cpp/rn-llama.hpp.
We won't try to integrate the CLBlast part for now, because not all Android devices supports OpenCL.
We can look at ggerganov/llama.cpp#2059 / ggerganov/llama.cpp#2039 for a chance to land so we can use the Vulkan backend.
Port llama.cpp/examples/llama-bench/llama-bench.cpp as method so we can easier to collect the benchmark results in mobile devices.
So we can easier to test model params without using debug mode.
I am having troubling updating the llamacpp submodule mysel. Could the project be updated as llamacpp has added support for a few new base models that currently do not work in llama.rn?
Add utisl for expose llama_tokenize
and llama_get_embeddings
to JS side.
For context we need provide embedding param to enable that.
Hello,
I've been working on getting the stablelm-2-zephyr-1_6b-Q8_0.gguf operational (link: https://huggingface.co/spaces/stabilityai/stablelm-2-1_6b-zephyr), especially since the 3B version seems to function quite well. However, I'm encountering an issue with the 1.6B version where it fails to initialize the context. Currently, I'm using the latest version of your master branch to compile the library. Is there a straightforward modification I can make on my end to resolve this?
from logs:
01-29 22:50:05.365 3017 20732 E RNLLAMA_LOG_ANDROID: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 340, got 268
Thank you.
Shouldn't there be a function that allows the user to stop inference? Could be implemented as a callback function just like in whisper.rn's realtimeInference()
Hello again, I've received reports from users of ChatterUI that model loading fails on Xiaomi branded phones:
Confirmed not working:
I've also queried about other phones, and got a few responses for working devices.
Confirmed working:
Version used:
Logcat response on the tested Poco F5:
RNLLAMA_ANDROID_JNI: [RNLlama] is_model_loaded false
There aren't enough users to confirm this is a trend across all Xiaomi phones, but it is peculiar.
llama.cpp includes LLaVa example (+clip.cpp), we could use it to provide vision support. We may implement it after #30 is done.
Also, it will be great if we could make an another package named clip.rn or react-native-clip, but currently I afraid we haven't more resources to maintain it, so just keep in mind.
First of all, thanks for the hard work on bringing this project to the react-native ecosystem.
I have been using llama.rn for a few weeks now in my personal project:
https://github.com/Vali-98/ChatterUI
I was wondering if there is any interest in implementing OpenCL for android. I have attempted to work on it myself to little success, given my inexperience with native modules.
Support save prompt cache as file so we could speed up context initialization + prompt processing.
As mentioned in the title, setting a seed value does not make an output deterministic on Android.
llama.rn version: 0.3.0-rc.13
Model used: phi-2.Q3_K_M.gguf
Android Devices Tested on: Emulated Pixel 3a - Android 14
Params used:
{
"frequency_penalty": 0,
"grammar": "",
"min_p": 0.07,
"mirostat": 0,
"mirostat_eta": 0.1,
"mirostat_tau": 5,
"n_predict": 288,
"n_threads": 5,
"presence_penalty": 0,
"prompt": "",
"repeat_penalty": 1,
"seed": 2,
"stop": ["User:", "### Response: "],
"temperature": 1,
"tfs_z": 1,
"top_k": 0,
"top_p": 1,
"typical_p": 1
}
Layla is a project that also integrates llamacpp for mobile use:
https://github.com/l3utterfly/llama.cpp/tree/layla-build
After some quick testing, it does seem like Layla's fork for llamacpp runs models far faster on android than llama.rn, almost twice as fast in some cases with 7b models.
It would be wonderful in these improvements were added to llama.rn as well.
issue in the README regarding model loading. It mentions the 'gguf' model but lacks clear instructions. is file loading implemented yet? No model found is always result.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.