tjake / jlama Goto Github PK
View Code? Open in Web Editor NEWJlama is a modern Java inference engine for LLMs
License: Apache License 2.0
Jlama is a modern Java inference engine for LLMs
License: Apache License 2.0
Would you consider adding fine tune support to the list of commands.
I downloaded the model directly from meta's repo not hugging face but the code is looking for a file called
model.safetensors.index.json
when opening with loadWithWeights
But I do not have this file? Where is this coming from? There is a file called params.json {"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}
is that the same?
I want to build a local Copilot with JLama but generalist models are too big and slow.
Three candidates I found:
replit-code-v1_5-3b:
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.MPT
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.CODEGEN
WizardCoder-1B-V1.0 (using the safetensors branch):
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@693fe6c9): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.GPT_BIGCODE
Writing it in Java itself will solve that. So I did. Happy to contribute it so here you go FWIW
package com.github.tjake.jlama.cli;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class DownloadModel {
private static final String HF_ACCESS_TOKEN = System.getenv("HF_ACCESS_TOKEN");
private static final String MODEL_DIR = "models";
public static void main(String[] args) throws IOException {
if (args.length != 1) {
usage();
System.exit(1);
}
String hfModel = args[0];
String authHeader = null;
if (HF_ACCESS_TOKEN != null && !HF_ACCESS_TOKEN.isEmpty()) {
authHeader = "Authorization: Bearer " + HF_ACCESS_TOKEN;
}
InputStream modelInfoStream = getResponse("https://huggingface.co/api/models/" + hfModel, authHeader);
String modelInfo = readInputStream(modelInfoStream);
if (modelInfo == null) {
System.out.println("No valid model found or trying to access a restricted model (use HF_ACCESS_TOKEN env. var.)");
System.exit(1);
}
List<String> allFiles = parseFileList(modelInfo);
if (allFiles.isEmpty()) {
System.out.println("No valid model found");
System.exit(1);
}
List<String> tensorFiles = new ArrayList<>();
for (String currFile : allFiles) {
if (currFile.contains("safetensor")) {
tensorFiles.add(currFile);
}
}
if (tensorFiles.isEmpty()) {
System.out.println("Model is not available in safetensor format");
System.exit(1);
}
allFiles.addAll(Arrays.asList("config.json", "vocab.json", "tokenizer.json"));
Path modelDir = Paths.get(MODEL_DIR, hfModel);
try {
Files.createDirectories(modelDir);
} catch (IOException e) {
System.out.println("Error creating directory: " + modelDir);
System.exit(1);
}
for (String currFile : allFiles) {
System.out.println("Downloading file: " + modelDir.resolve(currFile));
downloadFile(hfModel, currFile, authHeader, modelDir.resolve(currFile));
}
System.out.println("Downloading file: " + modelDir.resolve("tokenizer.model") + " (if it exists)");
downloadFile(hfModel, "tokenizer.model", authHeader, modelDir.resolve("tokenizer.model"));
System.out.println("Done! Model downloaded in ./" + MODEL_DIR + "/" + hfModel);
}
private static void usage() {
System.out.println("""
usage: java DownloadModel [-h] owner/model_name
This program will download a safetensor files and inference configuration from huggingface.
To download restricted models set the HF_ACCESS_TOKEN environment variable to a valid HF access token.
To create a token see https://huggingface.co/settings/tokens
OPTIONS:
-h Show this message
EXAMPLES:
java DownloadModel gpt2-medium
java DownloadModel meta-llama/Llama-2-7b-chat-hf""");
}
private static List<String> parseFileList(String modelInfo) {
List<String> fileList = new ArrayList<>();
try {
ObjectMapper objectMapper = new ObjectMapper();
JsonNode rootNode = objectMapper.readTree(modelInfo);
JsonNode siblingsNode = rootNode.path("siblings");
if (siblingsNode.isArray()) {
for (JsonNode siblingNode : siblingsNode) {
String rFilename = siblingNode.path("rfilename").asText();
fileList.add(rFilename);
}
}
} catch (IOException e) {
System.out.println("Error parsing JSON: " + e.getMessage());
}
return fileList;
}
public static InputStream getResponse(String urlString, String authHeader) {
try {
URL url = new URL(urlString);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// Set the request method
connection.setRequestMethod("GET");
// Set the request header
if (authHeader != null)
connection.setRequestProperty("Authorization", authHeader);
// Get the response code
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// If the response code is 200 (HTTP_OK), return the input stream
return connection.getInputStream();
} else {
// If the response code is not 200, throw an IOException
throw new IOException("HTTP response code: " + responseCode);
}
}
catch (IOException ioe)
{
System.out.println("WARNING: Fetch of URL " + urlString + " failed due to " + ioe);
return null;
}
}
public static String readInputStream(InputStream inStream) throws IOException {
if (inStream == null) return null;
BufferedReader inReader = new BufferedReader(new InputStreamReader(inStream));
StringBuilder stringBuilder = new StringBuilder();
String currLine;
while ((currLine = inReader.readLine()) != null) {
stringBuilder.append(currLine);
stringBuilder.append(System.lineSeparator());
}
return stringBuilder.toString();
}
private static void downloadFile(String hfModel, String currFile, String authHeader, Path outputPath) throws IOException {
InputStream inStream = getResponse("https://huggingface.co/" + hfModel + "/resolve/main/" + currFile, authHeader);
if (inStream == null)
throw new IOException("WARNING: Fetch of file " + currFile + " failed.");
Files.copy(inStream, outputPath, StandardCopyOption.REPLACE_EXISTING);
}
}
[ERROR] testSaxpy(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.051 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testSaxpy(TestOperations.java:180)
[ERROR] testSxpby(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.031 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testSxpby(TestOperations.java:214)
[ERROR] testAccumulate(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.019 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testAccumulate(TestOperations.java:118)
[ERROR] testDotProduct(com.github.tjake.jlama.tensor.operations.TestOperations) Time elapsed: 0.144 s <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
at com.github.tjake.jlama.tensor.operations.TestOperations.testDotProduct(TestOperations.java:85)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] TestOperations.testAccumulate:118 » ClassCast a Vector<class java.lang.Integer...
[ERROR] TestOperations.testDotProduct:85 » ClassCast a Vector<class java.lang.Integer>...
[ERROR] TestOperations.testSaxpy:180 » ClassCast a Vector<class java.lang.Integer>: re...
[ERROR] TestOperations.testSxpby:214 » ClassCast a Vector<class java.lang.Integer>: re...
[INFO]
[ERROR] Tests run: 17, Failures: 0, Errors: 4, Skipped: 6
[```
See comments of #18
Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?
This worked in Oct 15 jlama:
$ ./run-cli.sh complete -p "def fib(" -t 0.2 -tc 24 -n 100 models/CodeLlama-7b-hf
Now it OOMs (note that I have doubled the default Xmx, which was not necessary in Oct)
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at picocli.CommandLine.executeUserObject(CommandLine.java:2035)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:30)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:111)
at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:66)
at com.github.tjake.jlama.cli.commands.CompleteCommand.run(CompleteCommand.java:16)
at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
... 8 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:107)
... 11 more
Caused by: java.lang.OutOfMemoryError
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.IntPipeline.forEach(IntPipeline.java:463)
at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:620)
at com.github.tjake.jlama.model.llama.LlamaModel.loadTransformerBlockWeights(LlamaModel.java:56)
at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:109)
at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:31)
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
... 14 more
Caused by: java.lang.OutOfMemoryError: Cannot reserve 180355136 bytes of direct buffer memory (allocated: 25708094948, limit: 25769803776)
at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
at com.github.tjake.jlama.util.UnsafeDirectByteBuffer.allocateAlignedByteBuffer(UnsafeDirectByteBuffer.java:36)
at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:73)
at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:112)
at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:16)
at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:172)
at com.github.tjake.jlama.model.llama.LlamaModel.lambda$loadTransformerBlockWeights$1(LlamaModel.java:70)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.