tjake / jlama Goto Github PK
View Code? Open in Web Editor NEWJlama is a modern LLM inference engine for Java
License: Apache License 2.0
Jlama is a modern LLM inference engine for Java
License: Apache License 2.0
Hi, I was trying this model here:
https://huggingface.co/MoMonir/llava-llama-3-8b-v1_1-GGUF
It also comes with some instructions on how to use it for images. Is this also possible somehow with Jlama, e.g. via setImage in inference params?
To support things like json output
This worked in Oct 15 jlama:
$ ./run-cli.sh complete -p "def fib(" -t 0.2 -tc 24 -n 100 models/CodeLlama-7b-hf
Now it OOMs (note that I have doubled the default Xmx, which was not necessary in Oct)
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at picocli.CommandLine.executeUserObject(CommandLine.java:2035)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:30)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:111)
at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:66)
at com.github.tjake.jlama.cli.commands.CompleteCommand.run(CompleteCommand.java:16)
at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
... 8 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:107)
... 11 more
Caused by: java.lang.OutOfMemoryError
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.IntPipeline.forEach(IntPipeline.java:463)
at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:620)
at com.github.tjake.jlama.model.llama.LlamaModel.loadTransformerBlockWeights(LlamaModel.java:56)
at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:109)
at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:31)
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
... 14 more
Caused by: java.lang.OutOfMemoryError: Cannot reserve 180355136 bytes of direct buffer memory (allocated: 25708094948, limit: 25769803776)
at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
at com.github.tjake.jlama.util.UnsafeDirectByteBuffer.allocateAlignedByteBuffer(UnsafeDirectByteBuffer.java:36)
at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:73)
at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:112)
at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:16)
at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:172)
at com.github.tjake.jlama.model.llama.LlamaModel.lambda$loadTransformerBlockWeights$1(LlamaModel.java:70)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Writing it in Java itself will solve that. So I did. Happy to contribute it so here you go FWIW
package com.github.tjake.jlama.cli;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class DownloadModel {
private static final String HF_ACCESS_TOKEN = System.getenv("HF_ACCESS_TOKEN");
private static final String MODEL_DIR = "models";
public static void main(String[] args) throws IOException {
if (args.length != 1) {
usage();
System.exit(1);
}
String hfModel = args[0];
String authHeader = null;
if (HF_ACCESS_TOKEN != null && !HF_ACCESS_TOKEN.isEmpty()) {
authHeader = "Authorization: Bearer " + HF_ACCESS_TOKEN;
}
InputStream modelInfoStream = getResponse("https://huggingface.co/api/models/" + hfModel, authHeader);
String modelInfo = readInputStream(modelInfoStream);
if (modelInfo == null) {
System.out.println("No valid model found or trying to access a restricted model (use HF_ACCESS_TOKEN env. var.)");
System.exit(1);
}
List<String> allFiles = parseFileList(modelInfo);
if (allFiles.isEmpty()) {
System.out.println("No valid model found");
System.exit(1);
}
List<String> tensorFiles = new ArrayList<>();
for (String currFile : allFiles) {
if (currFile.contains("safetensor")) {
tensorFiles.add(currFile);
}
}
if (tensorFiles.isEmpty()) {
System.out.println("Model is not available in safetensor format");
System.exit(1);
}
allFiles.addAll(Arrays.asList("config.json", "vocab.json", "tokenizer.json"));
Path modelDir = Paths.get(MODEL_DIR, hfModel);
try {
Files.createDirectories(modelDir);
} catch (IOException e) {
System.out.println("Error creating directory: " + modelDir);
System.exit(1);
}
for (String currFile : allFiles) {
System.out.println("Downloading file: " + modelDir.resolve(currFile));
downloadFile(hfModel, currFile, authHeader, modelDir.resolve(currFile));
}
System.out.println("Downloading file: " + modelDir.resolve("tokenizer.model") + " (if it exists)");
downloadFile(hfModel, "tokenizer.model", authHeader, modelDir.resolve("tokenizer.model"));
System.out.println("Done! Model downloaded in ./" + MODEL_DIR + "/" + hfModel);
}
private static void usage() {
System.out.println("""
usage: java DownloadModel [-h] owner/model_name
This program will download a safetensor files and inference configuration from huggingface.
To download restricted models set the HF_ACCESS_TOKEN environment variable to a valid HF access token.
To create a token see https://huggingface.co/settings/tokens
OPTIONS:
-h Show this message
EXAMPLES:
java DownloadModel gpt2-medium
java DownloadModel meta-llama/Llama-2-7b-chat-hf""");
}
private static List<String> parseFileList(String modelInfo) {
List<String> fileList = new ArrayList<>();
try {
ObjectMapper objectMapper = new ObjectMapper();
JsonNode rootNode = objectMapper.readTree(modelInfo);
JsonNode siblingsNode = rootNode.path("siblings");
if (siblingsNode.isArray()) {
for (JsonNode siblingNode : siblingsNode) {
String rFilename = siblingNode.path("rfilename").asText();
fileList.add(rFilename);
}
}
} catch (IOException e) {
System.out.println("Error parsing JSON: " + e.getMessage());
}
return fileList;
}
public static InputStream getResponse(String urlString, String authHeader) {
try {
URL url = new URL(urlString);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// Set the request method
connection.setRequestMethod("GET");
// Set the request header
if (authHeader != null)
connection.setRequestProperty("Authorization", authHeader);
// Get the response code
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// If the response code is 200 (HTTP_OK), return the input stream
return connection.getInputStream();
} else {
// If the response code is not 200, throw an IOException
throw new IOException("HTTP response code: " + responseCode);
}
}
catch (IOException ioe)
{
System.out.println("WARNING: Fetch of URL " + urlString + " failed due to " + ioe);
return null;
}
}
public static String readInputStream(InputStream inStream) throws IOException {
if (inStream == null) return null;
BufferedReader inReader = new BufferedReader(new InputStreamReader(inStream));
StringBuilder stringBuilder = new StringBuilder();
String currLine;
while ((currLine = inReader.readLine()) != null) {
stringBuilder.append(currLine);
stringBuilder.append(System.lineSeparator());
}
return stringBuilder.toString();
}
private static void downloadFile(String hfModel, String currFile, String authHeader, Path outputPath) throws IOException {
InputStream inStream = getResponse("https://huggingface.co/" + hfModel + "/resolve/main/" + currFile, authHeader);
if (inStream == null)
throw new IOException("WARNING: Fetch of file " + currFile + " failed.");
Files.copy(inStream, outputPath, StandardCopyOption.REPLACE_EXISTING);
}
}
[INFO] --- maven-antrun-plugin:1.8:run (write-version-properties) @ jlama-native ---
[WARNING] Parameter tasks is deprecated, use target instead
[INFO] Executing tasks
main:
[exec] Result: 128
[exec] Result: 128
[echo] Current commit: 0 on 1970-01-01 00:00:00 +0000
[mkdir] Created dir: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF
[propertyfile] Creating new property file: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF/com.github.tjake.versions.properties
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.8:run (build-native-lib) @ jlama-native ---
[INFO] Executing tasks
main:
[exec] mkdir -p /home/wf/share/Jlama-main/jlama-native/target/native-objs-only
[exec] gcc -o /home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o -c src/main/c/vector_simd.c -O3 -mavx512f -march=native -Werror -Wno-attributes -fPIC -fno-omit-frame-pointer -Wunused-variable
[exec] In file included from /usr/lib/gcc/x86_64-redhat-linux/13/include/immintrin.h:109,
[exec] from src/main/c/vector_simd.c:5:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h: 在函数(function) ‘dot_product_f32_q8_256’中:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] make: *** [Makefile:40:/home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o] 错误(Error) 1
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Jlama Parent 0.2.0-SNAPSHOT:
[INFO]
[INFO] Jlama Parent ....................................... SUCCESS [ 1.204 s]
[INFO] Jlama Core ......................................... SUCCESS [ 7.439 s]
[INFO] Jlama Native ....................................... FAILURE [ 1.955 s]
[INFO] Jlama Net .......................................... SKIPPED
[INFO] Jlama Cli .......................................... SKIPPED
[INFO] Jlama Tests ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.692 s
[INFO] Finished at: 2024-06-04T09:04:18+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (build-native-lib) on project jlama-native: An Ant BuildException has occured: exec returned: 2
[ERROR] around Ant part ...... @ 4:71 in /home/wf/share/Jlama-main/jlama-native/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :jlama-native
To make this work for me, I had to get the apple silicon version of jextract (Java 21) and add this to bin/jextract script:
JLINK_VM_OPTIONS=--enable-native-access=org.openjdk.jextract
and make sure the resulting library was "libjlama.dynlib" and in the java_library_path when running the app and comment out all of the test code in the main pom.xml (though I'm working off an older snapshot, so maybe the junits work now for Apple M2).
Caused by: java.lang.IllegalArgumentException: Out of range: 3145728000
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:203)
at com.google.common.primitives.Ints.checkedCast(Ints.java:88)
at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:88)
at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:128)
at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:30)
at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:189)
at com.github.tjake.jlama.model.gemma.GemmaModel.loadInputWeights(GemmaModel.java:114)
at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:134)
at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:55)
at com.github.tjake.jlama.model.gemma.GemmaModel.<init>(GemmaModel.java:58)
See comments of #18
Is there any plan to support GGUF format directly apart from SafeTensor, that will allow to use this to load other GGUF's. If support already exists can we add it to readme file.
I want to build a local Copilot with JLama but generalist models are too big and slow.
Three candidates I found:
replit-code-v1_5-3b:
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.MPT
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.CODEGEN
WizardCoder-1B-V1.0 (using the safetensors branch):
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@693fe6c9): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.GPT_BIGCODE
[INFO] --- maven-antrun-plugin:1.8:run (write-version-properties) @ jlama-native ---
[WARNING] Parameter tasks is deprecated, use target instead
[INFO] Executing tasks
main:
[exec] Result: 128
[exec] Result: 128
[echo] Current commit: 0 on 1970-01-01 00:00:00 +0000
[mkdir] Created dir: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF
[propertyfile] Creating new property file: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF/com.github.tjake.versions.properties
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.8:run (build-native-lib) @ jlama-native ---
[INFO] Executing tasks
main:
[exec] mkdir -p /home/wf/share/Jlama-main/jlama-native/target/native-objs-only
[exec] gcc -o /home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o -c src/main/c/vector_simd.c -O3 -mavx512f -march=native -Werror -Wno-attributes -fPIC -fno-omit-frame-pointer -Wunused-variable
[exec] In file included from /usr/lib/gcc/x86_64-redhat-linux/13/include/immintrin.h:109,
[exec] from src/main/c/vector_simd.c:5:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h: 在函数(function) ‘dot_product_f32_q8_256’中:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] make: *** [Makefile:40:/home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o] 错误(Error) 1
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Jlama Parent 0.2.0-SNAPSHOT:
[INFO]
[INFO] Jlama Parent ....................................... SUCCESS [ 1.204 s]
[INFO] Jlama Core ......................................... SUCCESS [ 7.439 s]
[INFO] Jlama Native ....................................... FAILURE [ 1.955 s]
[INFO] Jlama Net .......................................... SKIPPED
[INFO] Jlama Cli .......................................... SKIPPED
[INFO] Jlama Tests ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.692 s
[INFO] Finished at: 2024-06-04T09:04:18+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (build-native-lib) on project jlama-native: An Ant BuildException has occured: exec returned: 2
[ERROR] around Ant part ...... @ 4:71 in /home/wf/share/Jlama-main/jlama-native/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :jlama-native
I downloaded the model directly from meta's repo not hugging face but the code is looking for a file called
model.safetensors.index.json
when opening with loadWithWeights
But I do not have this file? Where is this coming from? There is a file called params.json {"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}
is that the same?
Hello!
Is this project still being maintained in a major way? I can't quite get it to work.
I tried to run this software by using the example code provided, except modified it to simply use a model I have on my computer already:
``
public static void main(String[] args) {
File model = new File("models/TinyLlama-1.1B-Chat-v1.0/");
AbstractModel m = ModelSupport.loadModel(model, DType.F32, DType.I8);
String prompt = "Say hello.";
if (m.promptSupport().isPresent()) {
prompt = m.promptSupport().get().newBuilder()
.addSystemMessage("You're a helpful chatbot who writes short responses.")
.addUserMessage(prompt)
.build();
}
System.out.println("Prompt: " + prompt + "\n");
Response r = m.generate(UUID.randomUUID(), prompt, 0.7f, 256, false, (s, f) -> {
System.out.println(s);
});
System.out.println(r.toString());
}
``
I'll preface by saying I was trying to use a model that wasn't suggested by you on the main readme page, as I was under the impression that you could run any type of chat-based safetensor model with this API. So I apologize if this all turns out to be a major misunderstanding. I'll admit that I'm new to LLM-based projects and I'm trying to learn more in my spare time.
Anyway, first of all, it failed to run at first. It seems to require a specific preview feature that's only currently in Java21(?). A friend said it's something to do with a new vector instruction API. I haven't read about it much myself, but either way- I couldn't get this API to run on anything besides Java21. Here's the error I got in Java22:
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/github/tjake/jlama/tensor/AbstractTensor (class file version 65.65535) was compiled with preview features that are unsupported. This version of the Java Runtime only recognizes preview features for class file version 66.65535
Then of course, in Java21, it gave me an error again, but that was just because I had to enable the preview features. I'll admit that the specific configuration puts me off a little bit from using it in my own projects. This doesn't seem to have been mentioned anywhere in your example.
So, I finally get past this, and it ends up giving me an error that there is no safetensor model found in the models folder I've made. At the time, it looked like this (don't mind the GGUF files):
That said, I ended up just realizing the API probably expects it to be formatted like it is on huggingface, so I adjusted it appropriately and put all of these files in a unique folder, and renamed the model to just "model.safetensors." That one's on me.
And finally, thinking I finally had all the pieces together, triumphantly ran the program to just get... this.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:161) at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:87) at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:77) at com.nokoriware.ai.test.JlamaTest.main(JlamaTest.java:16) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:153) ... 3 more Caused by: java.lang.NoClassDefFoundError: jdk/incubator/vector/Vector at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:128) at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:30) at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:189) at com.github.tjake.jlama.model.llama.LlamaModel.loadInputWeights(LlamaModel.java:61) at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:135) at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:55) at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) ... 6 more Caused by: java.lang.ClassNotFoundException: jdk.incubator.vector.Vector at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) ... 13 more
At this point, I decided it was best to...
Anyway, just so that this wasn't a complete failure on my part, I figured I'd report the problems here. If it's a user error, I apologize. But the good news is if that's the case, others can see this and learn from it to have an easier time setting it up.
Take care!
[ERROR] testSaxpy(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.051 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testSaxpy(TestOperations.java:180)
[ERROR] testSxpby(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.031 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testSxpby(TestOperations.java:214)
[ERROR] testAccumulate(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.019 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testAccumulate(TestOperations.java:118)
[ERROR] testDotProduct(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.144 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testDotProduct(TestOperations.java:85)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] TestOperations.testAccumulate:118 » ClassCast a Vector<class java.lang.Integer...
[ERROR] TestOperations.testDotProduct:85 » ClassCast a Vector<class java.lang.Integer>...
[ERROR] TestOperations.testSaxpy:180 » ClassCast a Vector<class java.lang.Integer>: re...
[ERROR] TestOperations.testSxpby:214 » ClassCast a Vector<class java.lang.Integer>: re...
[INFO]
[ERROR] Tests run: 17, Failures: 0, Errors: 4, Skipped: 6
[```
In Fedora 39, when executing ./run-cli.sh download gpt2-medium
or just mvn clean install
, I hit the failure.
[INFO] --- protobuf:0.6.1:compile (default) @ jlama-net ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Jlama Parent 0.2.0-SNAPSHOT:
[INFO]
[INFO] Jlama Parent ....................................... SUCCESS [ 0.562 s]
[INFO] Jlama Core ......................................... SUCCESS [ 2.585 s]
[INFO] Jlama Native ....................................... SUCCESS [ 1.259 s]
[INFO] Jlama Net .......................................... FAILURE [ 0.273 s]
[INFO] Jlama Cli .......................................... SKIPPED
[INFO] Jlama Tests ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.951 s
[INFO] Finished at: 2024-06-10T12:04:20+09:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile (default) on project jlama-net: Unable to resolve artifact: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:linux-x86_64-fedora:3.17.3
[ERROR]
[ERROR] Try downloading the file manually from the project website.
[ERROR]
[ERROR] Then, install it using the command:
[ERROR] mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.17.3 -Dclassifier=linux-x86_64-fedora -Dpackaging=exe -Dfile=/path/to/file
Indeed, classifier linux-x86_64-fedora
doesn't exist in https://repo1.maven.org/maven2/com/google/protobuf/protoc/3.17.3/
I think possible workarounds are:
os.detection.classifierWithLikes
. I don't know why this configuration is required, but removing fedora
solves the issue in my fedora.diff --git a/pom.xml b/pom.xml
index bfeedeb..72cd2a0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -45,7 +45,7 @@
<revision>0.2.0-SNAPSHOT</revision>
<osmaven.version>1.7.1</osmaven.version>
- <os.detection.classifierWithLikes>fedora,suse,arch</os.detection.classifierWithLikes>
+ <os.detection.classifierWithLikes>suse,arch</os.detection.classifierWithLikes>
<jni.classifier>${os.detected.name}-${os.detected.arch}</jni.classifier>
<spotless.version>2.43.0</spotless.version>
os.detected.classifier
diff --git a/pom.xml b/pom.xml
index bfeedeb..bbe7db5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -47,6 +47,7 @@
<osmaven.version>1.7.1</osmaven.version>
<os.detection.classifierWithLikes>fedora,suse,arch</os.detection.classifierWithLikes>
<jni.classifier>${os.detected.name}-${os.detected.arch}</jni.classifier>
+ <os.detected.classifier>linux-x86_64</os.detected.classifier>
<spotless.version>2.43.0</spotless.version>
<junit.version>4.13.2</junit.version>
I confirmed that the issue was solved with either workaround.
Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?
Would you consider adding fine tune support to the list of commands.
Feeback from #39 makes me want to support more jdk versions. Requires a mult-release jar magic 🪄
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.