Code Monkey home page Code Monkey logo

jlama's People

Contributors

jakemh avatar jbellis avatar phact avatar tjake avatar wmsouza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jlama's Issues

File model.safetensors.index.json not found

I downloaded the model directly from meta's repo not hugging face but the code is looking for a file called

model.safetensors.index.json when opening with loadWithWeights

But I do not have this file? Where is this coming from? There is a file called params.json {"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}

is that the same?

Feature request: support for the smallest reasonable codegen model

I want to build a local Copilot with JLama but generalist models are too big and slow.

Three candidates I found:
replit-code-v1_5-3b:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.MPT

codegen-2B-multi:

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.CODEGEN

WizardCoder-1B-V1.0 (using the safetensors branch):

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@693fe6c9): java.lang.IllegalArgumentException: No enum constant com.github.tjake.jlama.model.ModelSupport.ModelType.GPT_BIGCODE

./download-hf-model.sh wont work on Windows

Writing it in Java itself will solve that. So I did. Happy to contribute it so here you go FWIW

package com.github.tjake.jlama.cli;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class DownloadModel {
    private static final String HF_ACCESS_TOKEN = System.getenv("HF_ACCESS_TOKEN");
    private static final String MODEL_DIR = "models";

    public static void main(String[] args) throws IOException {
        if (args.length != 1) {
            usage();
            System.exit(1);
        }

        String hfModel = args[0];
        String authHeader = null;
        if (HF_ACCESS_TOKEN != null && !HF_ACCESS_TOKEN.isEmpty()) {
            authHeader = "Authorization: Bearer " + HF_ACCESS_TOKEN;
        }

        InputStream modelInfoStream = getResponse("https://huggingface.co/api/models/" + hfModel, authHeader);
        String modelInfo = readInputStream(modelInfoStream);

        if (modelInfo == null) {
            System.out.println("No valid model found or trying to access a restricted model (use HF_ACCESS_TOKEN env. var.)");
            System.exit(1);
        }

        List<String> allFiles = parseFileList(modelInfo);
        if (allFiles.isEmpty()) {
            System.out.println("No valid model found");
            System.exit(1);
        }

        List<String> tensorFiles = new ArrayList<>();
        for (String currFile : allFiles) {
            if (currFile.contains("safetensor")) {
                tensorFiles.add(currFile);
            }
        }

        if (tensorFiles.isEmpty()) {
            System.out.println("Model is not available in safetensor format");
            System.exit(1);
        }

        allFiles.addAll(Arrays.asList("config.json", "vocab.json", "tokenizer.json"));

        Path modelDir = Paths.get(MODEL_DIR, hfModel);
        try {
            Files.createDirectories(modelDir);
        } catch (IOException e) {
            System.out.println("Error creating directory: " + modelDir);
            System.exit(1);
        }

        for (String currFile : allFiles) {
            System.out.println("Downloading file: " + modelDir.resolve(currFile));
            downloadFile(hfModel, currFile, authHeader, modelDir.resolve(currFile));
        }

        System.out.println("Downloading file: " + modelDir.resolve("tokenizer.model") + " (if it exists)");
        downloadFile(hfModel, "tokenizer.model", authHeader, modelDir.resolve("tokenizer.model"));

        System.out.println("Done! Model downloaded in ./" + MODEL_DIR + "/" + hfModel);
    }

    private static void usage() {
        System.out.println("""
                usage: java DownloadModel [-h] owner/model_name

                This program will download a safetensor files and inference configuration from huggingface.
                To download restricted models set the HF_ACCESS_TOKEN environment variable to a valid HF access token.
                To create a token see https://huggingface.co/settings/tokens

                OPTIONS:
                   -h   Show this message

                EXAMPLES:
                    java DownloadModel gpt2-medium
                    java DownloadModel meta-llama/Llama-2-7b-chat-hf""");
    }

    private static List<String> parseFileList(String modelInfo) {
        List<String> fileList = new ArrayList<>();
        try {
            ObjectMapper objectMapper = new ObjectMapper();
            JsonNode rootNode = objectMapper.readTree(modelInfo);
            JsonNode siblingsNode = rootNode.path("siblings");
            if (siblingsNode.isArray()) {
                for (JsonNode siblingNode : siblingsNode) {
                    String rFilename = siblingNode.path("rfilename").asText();
                    fileList.add(rFilename);
                }
            }
        } catch (IOException e) {
            System.out.println("Error parsing JSON: " + e.getMessage());
        }
        return fileList;
    }

    public static InputStream getResponse(String urlString, String authHeader) {
        try {
            URL url = new URL(urlString);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();

            // Set the request method
            connection.setRequestMethod("GET");

            // Set the request header
            if (authHeader != null)
                connection.setRequestProperty("Authorization", authHeader);

            // Get the response code
            int responseCode = connection.getResponseCode();

            if (responseCode == HttpURLConnection.HTTP_OK) {
                // If the response code is 200 (HTTP_OK), return the input stream
                return connection.getInputStream();
            } else {
                // If the response code is not 200, throw an IOException
                throw new IOException("HTTP response code: " + responseCode);
            }
        }
        catch (IOException ioe)
        {
            System.out.println("WARNING: Fetch of URL " + urlString + " failed due to " + ioe);
            return null;
        }
    }

    public static String readInputStream(InputStream inStream) throws IOException {
        if (inStream == null) return null;

        BufferedReader inReader = new BufferedReader(new InputStreamReader(inStream));
        StringBuilder stringBuilder = new StringBuilder();

        String currLine;
        while ((currLine = inReader.readLine()) != null) {
            stringBuilder.append(currLine);
            stringBuilder.append(System.lineSeparator());
        }

        return stringBuilder.toString();
    }
    private static void downloadFile(String hfModel, String currFile, String authHeader, Path outputPath) throws IOException {
        InputStream inStream = getResponse("https://huggingface.co/" + hfModel + "/resolve/main/" + currFile, authHeader);
        if (inStream == null)
            throw new IOException("WARNING: Fetch of file " + currFile + " failed.");
        Files.copy(inStream, outputPath, StandardCopyOption.REPLACE_EXISTING);
    }
}

Windows build failures

[ERROR] testSaxpy(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.051 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testSaxpy(TestOperations.java:180)

[ERROR] testSxpby(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.031 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testSxpby(TestOperations.java:214)

[ERROR] testAccumulate(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.019 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testAccumulate(TestOperations.java:118)

[ERROR] testDotProduct(com.github.tjake.jlama.tensor.operations.TestOperations)  Time elapsed: 0.144 s  <<< ERROR!
java.lang.ClassCastException: a Vector<class java.lang.Integer>: required Species[int, 16, S_512_BIT] but found Species[int, 8, S_256_BIT]
        at com.github.tjake.jlama.tensor.operations.TestOperations.testDotProduct(TestOperations.java:85)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   TestOperations.testAccumulate:118 » ClassCast a Vector<class java.lang.Integer...
[ERROR]   TestOperations.testDotProduct:85 » ClassCast a Vector<class java.lang.Integer>...
[ERROR]   TestOperations.testSaxpy:180 » ClassCast a Vector<class java.lang.Integer>: re...
[ERROR]   TestOperations.testSxpby:214 » ClassCast a Vector<class java.lang.Integer>: re...
[INFO]
[ERROR] Tests run: 17, Failures: 0, Errors: 4, Skipped: 6
[```

streaming server support?

Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?

CodeLlama loading is broken?

This worked in Oct 15 jlama:

$ ./run-cli.sh complete -p "def fib(" -t 0.2 -tc 24 -n 100 models/CodeLlama-7b-hf

Now it OOMs (note that I have doubled the default Xmx, which was not necessary in Oct)

Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.github.tjake.jlama.cli.commands.CompleteCommand@32b260fa): java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at picocli.CommandLine.executeUserObject(CommandLine.java:2035)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2264)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2664)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2599)
	at com.github.tjake.jlama.cli.JlamaCli.main(JlamaCli.java:30)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:111)
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:66)
	at com.github.tjake.jlama.cli.commands.CompleteCommand.run(CompleteCommand.java:16)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2026)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:74)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at com.github.tjake.jlama.model.ModelSupport.loadModel(ModelSupport.java:107)
	... 11 more
Caused by: java.lang.OutOfMemoryError
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:542)
	at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:567)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:670)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.IntPipeline.forEach(IntPipeline.java:463)
	at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:620)
	at com.github.tjake.jlama.model.llama.LlamaModel.loadTransformerBlockWeights(LlamaModel.java:56)
	at com.github.tjake.jlama.model.AbstractModel.<init>(AbstractModel.java:109)
	at com.github.tjake.jlama.model.llama.LlamaModel.<init>(LlamaModel.java:31)
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	... 14 more
Caused by: java.lang.OutOfMemoryError: Cannot reserve 180355136 bytes of direct buffer memory (allocated: 25708094948, limit: 25769803776)
	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:360)
	at com.github.tjake.jlama.util.UnsafeDirectByteBuffer.allocateAlignedByteBuffer(UnsafeDirectByteBuffer.java:36)
	at com.github.tjake.jlama.tensor.FloatBufferTensor.<init>(FloatBufferTensor.java:73)
	at com.github.tjake.jlama.safetensors.Weights.load(Weights.java:112)
	at com.github.tjake.jlama.safetensors.WeightLoader.load(WeightLoader.java:16)
	at com.github.tjake.jlama.safetensors.SafeTensorIndex.load(SafeTensorIndex.java:172)
	at com.github.tjake.jlama.model.llama.LlamaModel.lambda$loadTransformerBlockWeights$1(LlamaModel.java:70)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.