Code Monkey home page Code Monkey logo

jlama's People

Contributors

jakemh avatar jbellis avatar kishida avatar phact avatar tjake avatar wmsouza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jlama's Issues

CodeLlama loading is broken?

This worked in Oct 15 jlama:

$ ./run-cli.sh complete -p "def fib(" -t 0.2 -tc 24 -n 100 models/CodeLlama-7b-hf

Now it OOMs (note that I have doubled the default Xmx, which was not necessary in Oct)

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at picocli.CommandLine.executeUserObject(CommandLine.java:2035)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
	at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:30)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:111)
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:66)
	at com.github.tjake.jlama.cli.commands.CompleteCommand.run(CompleteCommand.java:16)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:107)
	... 11 more
Caused by: java.lang.OutOfMemoryError
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
	at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.IntPipeline.forEach(IntPipeline.java:463)
	at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:620)
	at com.github.tjake.jlama.model.llama.LlamaModel.loadTransformerBlockWeights(LlamaModel.java:56)
	at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:109)
	at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:31)
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	... 14 more
Caused by: java.lang.OutOfMemoryError: Cannot reserve 180355136 bytes of direct buffer memory (allocated: 25708094948, limit: 25769803776)
	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
	at com.github.tjake.jlama.util.UnsafeDirectByteBuffer.allocateAlignedByteBuffer(UnsafeDirectByteBuffer.java:36)
	at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:73)
	at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:112)
	at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:16)
	at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:172)
	at com.github.tjake.jlama.model.llama.LlamaModel.lambda$loadTransformerBlockWeights$1(LlamaModel.java:70)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)

./download-hf-model.sh wont work on Windows

Writing it in Java itself will solve that. So I did. Happy to contribute it so here you go FWIW

package com.github.tjake.jlama.cli;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class DownloadModel {
    private static final String HF_ACCESS_TOKEN = System.getenv("HF_ACCESS_TOKEN");
    private static final String MODEL_DIR = "models";

    public static void main(String[] args) throws IOException {
        if (args.length != 1) {
            usage();
            System.exit(1);
        }

        String hfModel = args[0];
        String authHeader = null;
        if (HF_ACCESS_TOKEN != null && !HF_ACCESS_TOKEN.isEmpty()) {
            authHeader = "Authorization: Bearer " + HF_ACCESS_TOKEN;
        }

        InputStream modelInfoStream = getResponse("https://huggingface.co/api/models/" + hfModel, authHeader);
        String modelInfo = readInputStream(modelInfoStream);

        if (modelInfo == null) {
            System.out.println("No valid model found or trying to access a restricted model (use HF_ACCESS_TOKEN env. var.)");
            System.exit(1);
        }

        List<String> allFiles = parseFileList(modelInfo);
        if (allFiles.isEmpty()) {
            System.out.println("No valid model found");
            System.exit(1);
        }

        List<String> tensorFiles = new ArrayList<>();
        for (String currFile : allFiles) {
            if (currFile.contains("safetensor")) {
                tensorFiles.add(currFile);
            }
        }

        if (tensorFiles.isEmpty()) {
            System.out.println("Model is not available in safetensor format");
            System.exit(1);
        }

        allFiles.addAll(Arrays.asList("config.json", "vocab.json", "tokenizer.json"));

        Path modelDir = Paths.get(MODEL_DIR, hfModel);
        try {
            Files.createDirectories(modelDir);
        } catch (IOException e) {
            System.out.println("Error creating directory: " + modelDir);
            System.exit(1);
        }

        for (String currFile : allFiles) {
            System.out.println("Downloading file: " + modelDir.resolve(currFile));
            downloadFile(hfModel, currFile, authHeader, modelDir.resolve(currFile));
        }

        System.out.println("Downloading file: " + modelDir.resolve("tokenizer.model") + " (if it exists)");
        downloadFile(hfModel, "tokenizer.model", authHeader, modelDir.resolve("tokenizer.model"));

        System.out.println("Done! Model downloaded in ./" + MODEL_DIR + "/" + hfModel);
    }

    private static void usage() {
        System.out.println("""
                usage: java DownloadModel [-h] owner/model_name

                This program will download a safetensor files and inference configuration from huggingface.
                To download restricted models set the HF_ACCESS_TOKEN environment variable to a valid HF access token.
                To create a token see https://huggingface.co/settings/tokens

                OPTIONS:
                   -h   Show this message

                EXAMPLES:
                    java DownloadModel gpt2-medium
                    java DownloadModel meta-llama/Llama-2-7b-chat-hf""");
    }

    private static List<String> parseFileList(String modelInfo) {
        List<String> fileList = new ArrayList<>();
        try {
            ObjectMapper objectMapper = new ObjectMapper();
            JsonNode rootNode = objectMapper.readTree(modelInfo);
            JsonNode siblingsNode = rootNode.path("siblings");
            if (siblingsNode.isArray()) {
                for (JsonNode siblingNode : siblingsNode) {
                    String rFilename = siblingNode.path("rfilename").asText();
                    fileList.add(rFilename);
                }
            }
        } catch (IOException e) {
            System.out.println("Error parsing JSON: " + e.getMessage());
        }
        return fileList;
    }

    public static InputStream getResponse(String urlString, String authHeader) {
        try {
            URL url = new URL(urlString);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();

            // Set the request method
            connection.setRequestMethod("GET");

            // Set the request header
            if (authHeader != null)
                connection.setRequestProperty("Authorization", authHeader);

            // Get the response code
            int responseCode = connection.getResponseCode();

            if (responseCode == HttpURLConnection.HTTP_OK) {
                // If the response code is 200 (HTTP_OK), return the input stream
                return connection.getInputStream();
            } else {
                // If the response code is not 200, throw an IOException
                throw new IOException("HTTP response code: " + responseCode);
            }
        }
        catch (IOException ioe)
        {
            System.out.println("WARNING: Fetch of URL " + urlString + " failed due to " + ioe);
            return null;
        }
    }

    public static String readInputStream(InputStream inStream) throws IOException {
        if (inStream == null) return null;

        BufferedReader inReader = new BufferedReader(new InputStreamReader(inStream));
        StringBuilder stringBuilder = new StringBuilder();

        String currLine;
        while ((currLine = inReader.readLine()) != null) {
            stringBuilder.append(currLine);
            stringBuilder.append(System.lineSeparator());
        }

        return stringBuilder.toString();
    }
    private static void downloadFile(String hfModel, String currFile, String authHeader, Path outputPath) throws IOException {
        InputStream inStream = getResponse("https://huggingface.co/" + hfModel + "/resolve/main/" + currFile, authHeader);
        if (inStream == null)
            throw new IOException("WARNING: Fetch of file " + currFile + " failed.");
        Files.copy(inStream, outputPath, StandardCopyOption.REPLACE_EXISTING);
    }
}

Got error when running './run-cli.sh download tjake/llama2-7b-chat-hf-jlama-Q4'

[INFO] --- maven-antrun-plugin:1.8:run (write-version-properties) @ jlama-native ---
[WARNING] Parameter tasks is deprecated, use target instead
[INFO] Executing tasks

main:
[exec] Result: 128
[exec] Result: 128
[echo] Current commit: 0 on 1970-01-01 00:00:00 +0000
[mkdir] Created dir: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF
[propertyfile] Creating new property file: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF/com.github.tjake.versions.properties
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.8:run (build-native-lib) @ jlama-native ---
[INFO] Executing tasks

main:
[exec] mkdir -p /home/wf/share/Jlama-main/jlama-native/target/native-objs-only
[exec] gcc -o /home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o -c src/main/c/vector_simd.c -O3 -mavx512f -march=native -Werror -Wno-attributes -fPIC -fno-omit-frame-pointer -Wunused-variable
[exec] In file included from /usr/lib/gcc/x86_64-redhat-linux/13/include/immintrin.h:109,
[exec] from src/main/c/vector_simd.c:5:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h: 在函数(function) ‘dot_product_f32_q8_256’中:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] make: *** [Makefile:40:/home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o] 错误(Error) 1
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Jlama Parent 0.2.0-SNAPSHOT:
[INFO]
[INFO] Jlama Parent ....................................... SUCCESS [ 1.204 s]
[INFO] Jlama Core ......................................... SUCCESS [ 7.439 s]
[INFO] Jlama Native ....................................... FAILURE [ 1.955 s]
[INFO] Jlama Net .......................................... SKIPPED
[INFO] Jlama Cli .......................................... SKIPPED
[INFO] Jlama Tests ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.692 s
[INFO] Finished at: 2024-06-04T09:04:18+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (build-native-lib) on project jlama-native: An Ant BuildException has occured: exec returned: 2
[ERROR] around Ant part ...... @ 4:71 in /home/wf/share/Jlama-main/jlama-native/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :jlama-native

Apple Silicon native compilation

To make this work for me, I had to get the apple silicon version of jextract (Java 21) and add this to bin/jextract script:

JLINK_VM_OPTIONS=--enable-native-access=org.openjdk.jextract

and make sure the resulting library was "libjlama.dynlib" and in the java_library_path when running the app and comment out all of the test code in the main pom.xml (though I'm working off an older snapshot, so maybe the junits work now for Apple M2).

Switch all tensor operations to use longs

Caused by: java.lang.IllegalArgumentException: Out of range: 3145728000                                                                                                         
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:203)                                                                                           
        at com.google.common.primitives.Ints.checkedCast(Ints.java:88)                                                                                                          
        at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:88)                                                                                    
        at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:128)                                                                                                    
        at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:30)                                                                                           
        at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:189)                                                                                    
        at com.github.tjake.jlama.model.gemma.GemmaModel.loadInputWeights(GemmaModel.java:114)                                                                                  
        at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:134)                                                                                            
        at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:55)                                                                                             
        at com.github.tjake.jlama.model.gemma.GemmaModel.<init>(GemmaModel.java:58)                                                                                                                                                           

GGUF Support

Is there any plan to support GGUF format directly apart from SafeTensor, that will allow to use this to load other GGUF's. If support already exists can we add it to readme file.

Feature request: support for the smallest reasonable codegen model

I want to build a local Copilot with JLama but generalist models are too big and slow.

Three candidates I found:
replit-code-v1_5-3b:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.MPT

codegen-2B-multi:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.CODEGEN

WizardCoder-1B-V1.0 (using the safetensors branch):

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@693fe6c9): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.GPT_BIGCODE

In fedora39, got error when running './run-cli.sh download tjake/llama2-7b-chat-hf-jlama-Q4'

[INFO] --- maven-antrun-plugin:1.8:run (write-version-properties) @ jlama-native ---
[WARNING] Parameter tasks is deprecated, use target instead
[INFO] Executing tasks

main:
[exec] Result: 128
[exec] Result: 128
[echo] Current commit: 0 on 1970-01-01 00:00:00 +0000
[mkdir] Created dir: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF
[propertyfile] Creating new property file: /home/wf/share/Jlama-main/jlama-native/target/classes/META-INF/com.github.tjake.versions.properties
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.8:run (build-native-lib) @ jlama-native ---
[INFO] Executing tasks

main:
[exec] mkdir -p /home/wf/share/Jlama-main/jlama-native/target/native-objs-only
[exec] gcc -o /home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o -c src/main/c/vector_simd.c -O3 -mavx512f -march=native -Werror -Wno-attributes -fPIC -fno-omit-frame-pointer -Wunused-variable
[exec] In file included from /usr/lib/gcc/x86_64-redhat-linux/13/include/immintrin.h:109,
[exec] from src/main/c/vector_simd.c:5:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h: 在函数(function) ‘dot_product_f32_q8_256’中:
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] /usr/lib/gcc/x86_64-redhat-linux/13/include/fmaintrin.h:63:1: 错误(Error):inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch
[exec] 63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
[exec] | ^~~~~~~~~~~~~~~
[exec] src/main/c/vector_simd.c:46:15: 附注:从此处调用
[exec] 46 | sum = _mm256_fmadd_ps(va, vb_scaled, sum);
[exec] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[exec] make: *** [Makefile:40:/home/wf/share/Jlama-main/jlama-native/target/native-objs-only/vector_simd.o] 错误(Error) 1
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Jlama Parent 0.2.0-SNAPSHOT:
[INFO]
[INFO] Jlama Parent ....................................... SUCCESS [ 1.204 s]
[INFO] Jlama Core ......................................... SUCCESS [ 7.439 s]
[INFO] Jlama Native ....................................... FAILURE [ 1.955 s]
[INFO] Jlama Net .......................................... SKIPPED
[INFO] Jlama Cli .......................................... SKIPPED
[INFO] Jlama Tests ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.692 s
[INFO] Finished at: 2024-06-04T09:04:18+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (build-native-lib) on project jlama-native: An Ant BuildException has occured: exec returned: 2
[ERROR] around Ant part ...... @ 4:71 in /home/wf/share/Jlama-main/jlama-native/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :jlama-native

File model.safetensors.index.json not found

I downloaded the model directly from meta's repo not hugging face but the code is looking for a file called

model.safetensors.index.json when opening with loadWithWeights

But I do not have this file? Where is this coming from? There is a file called params.json {"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}

is that the same?

Is this still being worked on? The example program won't run and it appears to require a very specific JVM version.

Hello!

Is this project still being maintained in a major way? I can't quite get it to work.

I tried to run this software by using the example code provided, except modified it to simply use a model I have on my computer already:

``
public static void main(String[] args) {

	File model = new File("models/TinyLlama-1.1B-Chat-v1.0/");
	
	AbstractModel m = ModelSupport.loadModel(model, DType.F32, DType.I8);
	
	String prompt = "Say hello.";
	
	if (m.promptSupport().isPresent()) {
		
		prompt = m.promptSupport().get().newBuilder()
				.addSystemMessage("You're a helpful chatbot who writes short responses.")
				.addUserMessage(prompt)
				.build();
	}
	
	
	System.out.println("Prompt: " + prompt + "\n");
	
	Response r = m.generate(UUID.randomUUID(), prompt, 0.7f, 256, false, (s, f) -> {
		System.out.println(s);
	});
	
	System.out.println(r.toString());
	
}

``
I'll preface by saying I was trying to use a model that wasn't suggested by you on the main readme page, as I was under the impression that you could run any type of chat-based safetensor model with this API. So I apologize if this all turns out to be a major misunderstanding. I'll admit that I'm new to LLM-based projects and I'm trying to learn more in my spare time.

Anyway, first of all, it failed to run at first. It seems to require a specific preview feature that's only currently in Java21(?). A friend said it's something to do with a new vector instruction API. I haven't read about it much myself, but either way- I couldn't get this API to run on anything besides Java21. Here's the error I got in Java22:

Exception in thread "main" java.lang.UnsupportedClassVersionError: com/github/tjake/jlama/tensor/AbstractTensor (class file version 65.65535) was compiled with preview features that are unsupported. This version of the Java Runtime only recognizes preview features for class file version 66.65535

Then of course, in Java21, it gave me an error again, but that was just because I had to enable the preview features. I'll admit that the specific configuration puts me off a little bit from using it in my own projects. This doesn't seem to have been mentioned anywhere in your example.

So, I finally get past this, and it ends up giving me an error that there is no safetensor model found in the models folder I've made. At the time, it looked like this (don't mind the GGUF files):

image

That said, I ended up just realizing the API probably expects it to be formatted like it is on huggingface, so I adjusted it appropriately and put all of these files in a unique folder, and renamed the model to just "model.safetensors." That one's on me.

And finally, thinking I finally had all the pieces together, triumphantly ran the program to just get... this.

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:161) at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:87) at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:77) at com.nokoriware.ai.test.JlamaTest.main(JlamaTest.java:16) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:153) ... 3 more Caused by: java.lang.NoClassDefFoundError: jdk/incubator/vector/Vector at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:128) at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:30) at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:189) at com.github.tjake.jlama.model.llama.LlamaModel.loadInputWeights(LlamaModel.java:61) at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:135) at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:55) at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) ... 6 more Caused by: java.lang.ClassNotFoundException: jdk.incubator.vector.Vector at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) ... 13 more

At this point, I decided it was best to...

Anyway, just so that this wasn't a complete failure on my part, I figured I'd report the problems here. If it's a user error, I apologize. But the good news is if that's the case, others can see this and learn from it to have an easier time setting it up.

Take care!

Windows build failures

[ERROR] testSaxpy(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.051 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testSaxpy(TestOperations.java:180)

[ERROR] testSxpby(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.031 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testSxpby(TestOperations.java:214)

[ERROR] testAccumulate(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.019 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testAccumulate(TestOperations.java:118)

[ERROR] testDotProduct(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.144 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testDotProduct(TestOperations.java:85)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   TestOperations.testAccumulate:118 » ClassCast a Vector<class java.lang.Integer...
[ERROR]   TestOperations.testDotProduct:85 » ClassCast a Vector<class java.lang.Integer>...
[ERROR]   TestOperations.testSaxpy:180 » ClassCast a Vector<class java.lang.Integer>: re...
[ERROR]   TestOperations.testSxpby:214 » ClassCast a Vector<class java.lang.Integer>: re...
[INFO]
[ERROR] Tests run: 17, Failures: 0, Errors: 4, Skipped: 6
[```

Unable to resolve artifact: Missing: com.google.protobuf:protoc:exe:linux-x86_64-fedora:3.17.3

In Fedora 39, when executing ./run-cli.sh download gpt2-medium or just mvn clean install, I hit the failure.

[INFO] --- protobuf:0.6.1:compile (default) @ jlama-net ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Jlama Parent 0.2.0-SNAPSHOT:
[INFO] 
[INFO] Jlama Parent ....................................... SUCCESS [  0.562 s]
[INFO] Jlama Core ......................................... SUCCESS [  2.585 s]
[INFO] Jlama Native ....................................... SUCCESS [  1.259 s]
[INFO] Jlama Net .......................................... FAILURE [  0.273 s]
[INFO] Jlama Cli .......................................... SKIPPED
[INFO] Jlama Tests ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  4.951 s
[INFO] Finished at: 2024-06-10T12:04:20+09:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile (default) on project jlama-net: Unable to resolve artifact: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:linux-x86_64-fedora:3.17.3
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.17.3 -Dclassifier=linux-x86_64-fedora -Dpackaging=exe -Dfile=/path/to/file

Indeed, classifier linux-x86_64-fedora doesn't exist in https://repo1.maven.org/maven2/com/google/protobuf/protoc/3.17.3/

I think possible workarounds are:

  1. Fix os.detection.classifierWithLikes. I don't know why this configuration is required, but removing fedora solves the issue in my fedora.
diff --git a/pom.xml b/pom.xml
index bfeedeb..72cd2a0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -45,7 +45,7 @@
     <revision>0.2.0-SNAPSHOT</revision>
 
     <osmaven.version>1.7.1</osmaven.version>
-    <os.detection.classifierWithLikes>fedora,suse,arch</os.detection.classifierWithLikes>
+    <os.detection.classifierWithLikes>suse,arch</os.detection.classifierWithLikes>
     <jni.classifier>${os.detected.name}-${os.detected.arch}</jni.classifier>
 
     <spotless.version>2.43.0</spotless.version>
  1. Explicitly define os.detected.classifier
diff --git a/pom.xml b/pom.xml
index bfeedeb..bbe7db5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -47,6 +47,7 @@
     <osmaven.version>1.7.1</osmaven.version>
     <os.detection.classifierWithLikes>fedora,suse,arch</os.detection.classifierWithLikes>
     <jni.classifier>${os.detected.name}-${os.detected.arch}</jni.classifier>
+    <os.detected.classifier>linux-x86_64</os.detected.classifier>
 
     <spotless.version>2.43.0</spotless.version>
     <junit.version>4.13.2</junit.version>

I confirmed that the issue was solved with either workaround.

streaming server support?

Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?

Support Java 20/21/22

Feeback from #39 makes me want to support more jdk versions. Requires a mult-release jar magic 🪄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.