abetlen / llama-cpp-python Goto Github PK

View Code? Open in Web Editor NEW

7.4K 7.4K 895.0 2 MB

Python bindings for llama.cpp

Home Page: https://llama-cpp-python.readthedocs.io

License: MIT License

Python 97.69% CMake 0.75% Dockerfile 0.67% Shell 0.55% Makefile 0.33%

llama-cpp-python's People

Contributors

Stargazers

Watchers

Forkers

prasadchandan crack521 blacklotus millionthodin16 sagsmug ottomanz lapnd keldenl guhaaa nsarrazin mrsipan ugoslytherin pjq abaso007 spyd3rweb octag0no tracer0tong ishan-marikar hrubanj hengjiustc juangon riverzhou mgonzs13 llaith-ai alejandroacho ludicityrock buckedunicorn cookiekira sang2306 jm12138 nyuashey cybersys thenetguy jackiej techventurebuilder sudosu4pp stevendbennett lan956 jmtatsch niek lixiccccc agreenbhm limbail gjmulder jelly-joonmyung truemand xfree co-simulation willtejeda jeanmoumou llukas22 pandilin evahteev alibabaoglu charlest100 cyp4x141 loretoparisi deedeecx330 adrien2112 slidersun msgpo laudehenri redwa etiennexiong kroonen bowenwen prodject residentivo wxjiao guoqiangjia suparious zyjdg01 limour-dev swg myaniu dkzdev undeadzed minimum-enterprise snxraven th-neu corneille9 s1lvester hongbopeng stonelinks amesianx jpodivin zutto janpokrzywinski lidanger matthoffner harnoorsingh79 leehyunuk abhishekmamdapure miguelamendez off-by-some dnzzl anonymousamalgrams maximilian-winter dpaste20 lxa-g

llama-cpp-python's Issues

make: *** No rule to make target `libllama.so'. Stop.

Configuring Project
        Working directory:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-build
        Command:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/cmake/data/bin/cmake /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006 -G Ninja -DCMAKE_INSTALL_PREFIX:PATH=/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-install -DPYTHON_VERSION_STRING:STRING=3.9.6 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/Users/emmanuel/workspace/code/.venv/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -DPYTHON_LIBRARY:PATH=libpython3.9.a -DPython_EXECUTABLE:PATH=/Users/emmanuel/workspace/code/.venv/bin/python3 -DPython_ROOT_DIR:PATH=/Users/emmanuel/workspace/code/.venv -DPython_INCLUDE_DIR:PATH=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -DPython_FIND_REGISTRY:STRING=NEVER -DPython3_EXECUTABLE:PATH=/Users/emmanuel/workspace/code/.venv/bin/python3 -DPython3_ROOT_DIR:PATH=/Users/emmanuel/workspace/code/.venv -DPython3_INCLUDE_DIR:PATH=/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -DPython3_FIND_REGISTRY:STRING=NEVER -DCMAKE_MAKE_PROGRAM:FILEPATH=/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/ninja/data/bin/ninja -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_OSX_DEPLOYMENT_TARGET:STRING=13.0 -DCMAKE_OSX_ARCHITECTURES:STRING=arm64
      
      -- The C compiler identification is AppleClang 14.0.0.14000029
      -- The CXX compiler identification is AppleClang 14.0.0.14000029
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done (0.3s)
      -- Generating done (0.0s)
      CMake Warning:
        Manually-specified variables were not used by the project:
      
          PYTHON_EXECUTABLE
          PYTHON_INCLUDE_DIR
          PYTHON_LIBRARY
          PYTHON_VERSION_STRING
          Python3_EXECUTABLE
          Python3_FIND_REGISTRY
          Python3_INCLUDE_DIR
          Python3_ROOT_DIR
          Python_EXECUTABLE
          Python_FIND_REGISTRY
          Python_INCLUDE_DIR
          Python_ROOT_DIR
          SKBUILD
      
      
      -- Build files have been written to: /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-build
      [1/2] Generating /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/vendor/llama.cpp/libllama.so
      FAILED: /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/vendor/llama.cpp/libllama.so
      cd /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/vendor/llama.cpp && make libllama.so
      make: *** No rule to make target `libllama.so'.  Stop.
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/setuptools_wrap.py", line 642, in setup
          cmkr.make(make_args, install_target=cmake_install_target, env=env)
        File "/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/cmaker.py", line 679, in make
          self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
        File "/private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/skbuild/cmaker.py", line 710, in make_impl
          raise SKBuildError(
      
      An error occurred while building with CMake.
        Command:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-build-env-pdqr94ne/overlay/lib/python3.9/site-packages/cmake/data/bin/cmake --build . --target install --config Release --
        Install target:
          install
        Source directory:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006
        Working directory:
          /private/var/folders/by/tgcbn2ys69n9xn0gjlnpt68m0000gn/T/pip-install-qw9aco_2/llama-cpp-python_b76ded6a17c94078b51d40aa49aeb006/_skbuild/macosx-13.0-arm64-3.9/cmake-build
      Please check the install target is valid and see CMake's output for more information.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python

can't seem to find the cpp file.

Any suggestions?

Long duration until generation starts with big context

When just saying like "Hello, who are you?", I get like 200ms/token and it starts generating almost instantly.
On the other hand, when I paste a small text (e.g. search results from duck duck go api) and ask a question to it, I have to wait +- 1min and then it generates but quite slow. Is this normal behaviour?

My cpu is a ryzen 7 6800h and 32gb ddr5 ram. I'm running vicuna 7b.
I paste the search result context from the python bindings.

AttributeError: 'Llama' object has no attribute 'embed'

I believe you forgot to change from 'embed' to 'create_embedding' in LlamaCppEmbeddings ('embed_documents' & 'embed_query')

AttributeError occured

can you help me? plz

LangChain 0.0.136,llama-cpp-python 0.1.31
AttributeError occured at this code.

it didn't happen yesterday.

Code:
from langchain.llms import LlamaCpp
LlmLlama = LlamaCpp(model_path="./ggml-vicuna-13b-4bit.bin")

Error:
AttributeError: 'Llama' object has no attribute 'ctx'

Incredibly slow response time

Hello.
I am still new to llama-cpp and I was wondering if it was normal that it takes an incredibly long time to respond to my prompt.

Fyi, I am assuming it runs on my CPU, here are my specs:

I have 16.0Gb of RAM
I am using an AMD Ryzen 7 1700X Eight-Core Processor rated at 3.40Ghz
Just in case, my GPU is a NVIDIA GeForce RTX 2070 SUPER.

Everything else seems to work fine, the model could be load correctly (Or at least, it seems to be).
I did a first test using the code showcased in the README.md

from llama_cpp import Llama
llm = Llama(model_path="models/7B/...")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

which returned me this:

The output is what I expected (Even though Uranus, Neptune and Pluto were missing), but when I see the total time, it is extremely long (1124707.08ms, 18 minutes).

I did this second code in order to try a bit to see what could be causing the insanely long response time but I don't know what's going on.

from llama_cpp import Llama
import time
print("Model loading")
llm = Llama(model_path="./model/ggml-model-q4_0_new.bin")

while True:
    prompt = input("Prompt> ")
    start_time = time.time()

    prompt = f"Q: {prompt} A: "
    print("Your prompt:", prompt, "Start time:", start_time)

    output = llm(prompt, max_tokens=1, stop=["Q:", "\n"], echo=True)
    print("Output:", output)
    print("End time:", time.time())
    print("--- Prompt reply duration: %s seconds ---" % (time.time() - start_time))

I may have done things wrong since I am still new to all of this, but do any of you have any idea on how I could speed up the process? I searched for solutions through google, github and different forums, but nothing seems to work.

PS: For those interested in the CLI output when it loads the model:

llama_model_load: loading model from './model/ggml-model-q4_0_new.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)
llama_model_load: loading tensors from './model/ggml-model-q4_0_new.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  512.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

I apologize in advance if my english doesn't make sense sometimes, it is not my native language.
Thanks in advance for the help, regards. 👋

Why is only the MSVC compiler required to build wheels?

main.py not running on M1 Mac due to llama_context_default_params symbol not found

Things were working fine until i closed my terminal window and opened a new one and starting seeing issues (don't remember the error). I went ahead and did a quick update (via "development") steps in readme and started getting this issue when running python3 -m llama_cpp.server

Traceback (most recent call last):
  File "/Users/kelden/opt/anaconda3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/kelden/opt/anaconda3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/kelden/Documents/tiny-leaps/llama-cpp-python/llama_cpp/__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "/Users/kelden/Documents/tiny-leaps/llama-cpp-python/llama_cpp/llama_cpp.py", line 99, in <module>
    _lib.llama_context_default_params.argtypes = []
  File "/Users/kelden/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
    func = self.__getitem__(name)
  File "/Users/kelden/opt/anaconda3/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x308a36490, llama_context_default_params): symbol not found

I've gone in and done make in llama.cpp again, run the develop script again and again to no avail. deleted the .so file and rebuilt it multiple times, made sure the MODEL variable is set properly too :/ what am i doing wrong

Interactive mode/Session

Hey there sorry if this obviously possible/easy, but is it possible right now to use llama.cpp's interactive mode and if so how?

[Question] Drop in replacement for OpenAI

I notice that you mentioned your goal of creating a drop in replacement for OpenAI. Awesome job! This is super helpful to have and especially with your demo using fastAPI.

I'm looking at langchain right now, and I see you have implemented most, if not all, of the OpenAI API including streaming. Since it got official integration with langchain today, and I'm getting ready to get the integration working with streaming as literally a drop in for OpenAI in langchain. Do you already have this done? Just trying to see what your goals are in the near future for this package :)

[Question] How to use kv cache?

Hello!

I have been trying to test the new kv cache loading and ran into an issue, it seems to segfault when running llama_eval.
To save the current cache i do:

import llama_cpp
import pickle
from ctypes import cast
# Some work...
kv_tokens = llama_cpp.llama_get_kv_cache_token_count(ctx)
kv_len = llama_cpp.llama_get_kv_cache_size(ctx)
kv_cache = llama_cpp.llama_get_kv_cache(ctx) 
kv_cache = cast(kv_cache, llama_cpp.POINTER(llama_cpp.c_uint8 * kv_len))
kv_cache = bytearray(kv_cache)
with open("test.bin", "wb") as f:
    pickle.dump([kv_cache,kv_tokens], f)

Loading:

with open("test.bin", "rb") as f:
    kv_cache, kv_tokens = pickle.load(f)
    llama_cpp.llama_set_kv_cache(ctx, 
	    (llama_cpp.c_uint8 * len(kv_cache)).from_buffer(kv_cache),
	    len(kv_cache),
	    kv_tokens
    )

But running llama_cpp.llama_eval after will result in a segfault.

llama-cpp-python version: 0.1.16

How do i fix this?
Thanks

Issue with emoji decoding in steaming mode, only

When the model wants to output an emoji, this error comes up:

Debugging middleware caught exception in streamed response at a point where response headers were already sent. Traceback (most recent call last): File "C:\Users\zblac\AppData\Local\Programs\Python\Python310\lib\site-packages\werkzeug\wsgi.py", line 500, in __next__ return self._next() File "C:\Users\zblac\AppData\Local\Programs\Python\Python310\lib\site-packages\werkzeug\wrappers\response.py", line 50, in _iter_encoded for item in iterable: File "C:\Users\zblac\llama.cpp\test\normal.py", line 37, in vicuna for line in response: File "C:\Users\zblac\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama.py", line 370, in _create_completion "text": text[start:].decode("utf-8"), UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 0: unexpected end of data

Add cli options to server

Unexpected output

EDIT: I'm running this on an M1 Macbook. Using the model directly works as expected, but running it through Python gives me this output. The .dylib binary is built from source too.

Do you know what could be giving me this output? Using the model without the bindings works as expected...

  "id": "cmpl-f49883d5-e368-4fa0-a4fa-bf758daa1831",
  "object": "text_completion",
  "created": 1680203705,
  "model": "ggml-model-q4_0-new.bin",
  "choices": [
    {
      "text": "Question: What are the names of the planets in the solar system? Answer: \u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c\u001c",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 48,
    "total_tokens": 67
  }
}

Investigate using `make` instead of `cmake` to build shared library

It's been pointed out that make may be better supported by llama.cpp (on some platforms). We're currently using scikit-build to build the shared library on installation with cmake but it also supports make.

Additional note as pointed out in #32 we should support passing environment variables in both settings.

Add support for get embeddings

Add performance optimization example

I've had some success using scikit-optimize to tune the parameters for the Llama class, can improve token eval performance by around ~50% from just the default parameters. Planning to turn this into a script, it could also be of some use for upstream llama.cpp users.

Add Verbose Logging Support to Diagnose Performance Issues

Sorry, this might be totally wrong place to open the issue. Feel free to close.

Anyway, I'm working with a 3rd party project* that uses your awesome wrapper and I'm having problems there, which brings me back here. Everything seems to be working, but not with the speed I expect after using plain llama.cpp. With some prompts it seems to even completely freeze, never completing the task. Could I somehow raise this wrapper's logging level to make it more verbose, so I could see in real-time as it works?

* https://github.com/hwchase17/langchain

[Investigate] Custom `llama.dll` Dependency Resolution Issues on Windows

This is a note for using a custom llama.dll build on Windows. I ran into dependency resolution issues with loading my own llama.dll compiled with BLAS support and some extra hardware specific optimization flags. No matter what I do, it can't seem to locate all of its dependencies, even though I've tried placing them in system paths and even same dir.

My current workaround is using the default llama.dll that llama-cpp-python builds, but it doesn't have the hardware optimizations and BLAS compatibility that I enabled in my custom build. So, I'm still trying to figure out what my issue is. Maybe something python specific that i'm missing...

I'm dropping this issue here just in case anyone else runs into something similar. If you have any ideas or workarounds, let me know. I'll keep trying to figure it out until I get it resolved haha :)

[Question] Purpose of completion ID field

I have a question about the id field in the data returned from the completions endpoint. I see that there's a unique ID that identifies what completion a message is part of, and I'm wondering if this is only data for the client, or whether it has additional functionality.

Eventually I'm hoping to have a a couple different models running on my server and I'm trying to figure out if there's a mechanism that exists for a sort of chat functionality with unique contexts. Llama.cpp recently gained the ability to run multiple instances at once without much overhead, so I'm looking for a way to keep a unique context between a couple conversation 'threads'.

Is there any mechanism, or is there a plan for one? Just want to make sure I'm not missing something if it's built already xD

{
  "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "object": "text_completion",
  "created": 1679561337,
  "model": "models/7B/...",
  "choices": [
    {
      "text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
      "index": 0,
      "logprobs": None,
      "finish_reason": "stop"
    }
  ],
...
}

Add tests

Fix unicode decoding error

Certain tokens in the vocabulary cannot be decoded to valid utf-8, I'm actually not sure if this is because they represent partial utf codepoints, but in any case they cause generation to fail.

Interactive mode with Llama class

Similar to interactive mode in llama.cpp.
Changes should not effect __call__ and the create_* method behaviour.
Should support max_tokens / infinite generation, eos / ignore_eos, and a reverse prompt.
Should support streaming

Text generation stops prematurely

Hi!

I have stumbled upon a problem with low_level_api_chat_cpp.py . When asking for generation of longer texts (for example "tell a story about a cat") the generated text is cut off prematurely only after a couple of sentences in the middle of a sentence. Using the same prompt in llama.cpp there is no such problem, the text is generated in it's entirety. Using this command for llama.cpp:

./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct --mlock -m models/ggml-vicuna-13b-4bit.bin

and this one in llama-cpp-python:

Is this a bug?

Implement `stream` option in high-level api

Allow tokens to be generated one at a time while still terminating before a stop sequence is emitted (may need to keep tokens buffered in the generator).

simple python api example?

I have the server running and everything, but I really fail to understand the documentation at http://localhost:8000/docs.
Is there a simple code example of how I would interact with this from python (flask)?

Like, e.g. my code for querying OpenAI (for which this should be a "drop-in" replacement) is the following, what would be the equiqualent when using llama-cpp-python?

  def get_text_gpt(prompt_persona, prompt, temp=0.8, freqpen=0.0, stop=None, biasdict=None, maxtok=512):
      # make sure mutable default arguments are reset
      biasdict = {} if biasdict is None else biasdict
      stop = "" if stop is None else stop
  
      try:
          response = openai.ChatCompletion.create(
              model="gpt-3.5-turbo",
              temperature=temp,
              max_tokens=maxtok,
              frequency_penalty=freqpen,
              stop=stop,
              logit_bias=biasdict,
              messages=[
                  {"role": "system", "content": prompt_persona},
                  {"role": "user", "content": prompt}]
          )
          message_content = response['choices'][0]['message']['content']
          return (message_content)
  
      except Exception as e:
          error = f"GPT API error: {e}"
          return error

On m1 mac, after install, running and navigating to local URL I get error "{"detail":"Not Found"}"

Everything worked without errors but the web page says:
{"detail":"Not Found"}

Console shows the following:
INFO: Started server process [4194]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO: ::1:58413 - "GET / HTTP/1.1" 404 Not Found

The only thing I didn't understand about the install process was what was supposed to go in the brackets for [server] in the command pip "install llama-cpp-python[server]". Could that have caused a problem?

llama.cpp works perfectly standalone BTW.

Wrong size of embeddings

While playing around, I noticed the embeddings are only 512 floats rather than the 4096 you get when using the standalone application.

So I went digging and I found the culprit which was a copy-paste residue in the function llama_n_embd

llama-cpp-python/llama_cpp/llama_cpp.py

Lines 220 to 221 in 6d1bda4

    
           def llama_n_embd(ctx: llama_context_p) -> c_int: 
        
               return _lib.llama_n_ctx(ctx)

It's calling llama_n_ctx rather than llama_n_embd.

I don't think this warrants a pull request as it is a very easy issue to fix, so I made a simple issue instead.

Keep up the good work :)

Investigate model aliasing

Allow the user to alias their local models to OpenAI model names as many tools have those hard-coded.

This may cause unexpected issues with tokenization mismatches.

[Windows] [Windows] "Failed building wheel for llama-cpp-python

Edit : For now i've installed the wheel from "https://github.com/Loufe/llama-cpp-python/blob/main/wheels/llama_cpp_python-0.1.26-cp310-cp310-win_amd64.whl". The installation of the wheel works. So everything is fine for me. Got things working also in WSL with no issue.
I would still be happy to build the wheel myself, first as a learning experience, to understand what I did wrong, and secondly, because if I understood well from "#40", it might lead to better performance If I compile it myself ? "The issue is that the binaries will likely not be built with the correct optimizations for the users particular CPU which will likely result in much worse performance than the user expects."
Though maybe I did not understood correctly and it doesn't matter.
I leave the issue in case it might be useful to someone, or in case someone wants to try to help me build the wheel for fun.

Hi !
I've been trying to install this package for a while, but I can't get it working on windows.

When I run "pip install llama-cpp-python", I get the following errors :

(short version, i'll put the full output at the end of the message)

ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

It seems that it is trying to find a C compiler and then build the wheel for the library (as far as I understand it).
At some point it seems to find one :

-- Trying 'Visual Studio 16 2019 x64 v142' generator - success

But then it seems to fail :


CMake Error at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-m7g4zo_5/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
        The C compiler
          "C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe"
        is not able to compile a simple test program.

And then give me a very long output full of directory paths to explain to me why it failed.

Close to the end I can see an error that might be relevant :

An error occurred while configuring with CMake.
 Command: 'C:\Users\Antoine\AppData\Local\Temp\pip-build-env-m7g4zo_5\overlay\Lib\site-packages\cmake\data\bin/cmake.exe' 'C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7' -G 'Visual Studio 16 2019' '-DCMAKE_INSTALL_PREFIX:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7\_skbuild\win-amd64-3.10\cmake-install' -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE '-DCMAKE_MODULE_PATH:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-build-env-m7g4zo_5\overlay\Lib\site-packages\skbuild\resources\cmake' '-DPYTHON_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPYTHON_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' '-DPYTHON_LIBRARY:PATH=D:\Anaconda\envs\textgen\libs\python310.lib' '-DPython_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython_FIND_REGISTRY:STRING=NEVER '-DPython3_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython3_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython3_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython3_FIND_REGISTRY:STRING=NEVER -T v142 -A x64 -DCMAKE_BUILD_TYPE:STRING=Release
        Source directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7
        Working directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-plc32gz9\llama-cpp-python_ca1a0aba562945a18534f1636d884ac7\_skbuild\win-amd64-3.10\cmake-build
      Please see CMake's output for more information.

But I don't really know what to make of it.
I would really love to understand how to make it work on windows, but I lack knowledge on building wheels.
I've tried :
- Upgrading pip and setup tools
- I installed Visual studio AND build tools for C++ (therefore I have cmake on my computer, but I don't know if it's even used when trying to build the wheel considering the previous output...)
- Find 'CMake's output for more information.', but I have no idea where to find it and google didn't helped me on that one.
- Downloading the repo and trying to build it using cmake, but maybe I did it wrong :

(textgen) PS F:\ChatBots\text-generation-webui\repositories\GPTQ-for-LLaMa\cmaketentative\llama-cpp-python> cmake ./ -B./build
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
CMake Error at CMakeLists.txt:21 (add_subdirectory):
  The source directory

    F:/ChatBots/text-generation-webui/repositories/GPTQ-for-LLaMa/cmaketentative/llama-cpp-python/vendor/llama.cpp

  does not contain a CMakeLists.txt file.


CMake Error at CMakeLists.txt:22 (install):
  install TARGETS given target "llama" which does not exist.


-- Configuring incomplete, errors occurred!

-Tried doing the same thing with cygwin after installing it

I've tried since yesterday to make it work but I can't figure it out. Is there someone that could help me get this working on my windows machine ?

Thank you very much in advance.

Additional info :
I'm trying to install it in a conda environment named 'textgen", but not sure it is relevant.

Full error output :


(textgen) PS F:\ChatBots\text-generation-webui\repositories\GPTQ-for-LLaMa> pip install llama-cpp-python
Collecting llama-cpp-python
  Using cached llama_cpp_python-0.1.27.tar.gz (529 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in d:\anaconda\envs\textgen\lib\site-packages (from llama-cpp-python) (4.5.0)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [263 lines of output]


      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 17 2022

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
      -- The C compiler identification is MSVC 19.29.30148.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- The CXX compiler identification is MSVC 19.29.30148.0
      CMake Warning (dev) at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-yc585726/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCXXCompiler.cmake:168 (if):
        Policy CMP0054 is not set: Only interpret if() arguments as variables or
        keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
        details.  Use the cmake_policy command to set the policy and suppress this
        warning.

        Quoted variables like "MSVC" will no longer be dereferenced when the policy
        is set to NEW.  Since the policy is not set the OLD behavior will be used.
      Call Stack (most recent call first):
        CMakeLists.txt:4 (ENABLE_LANGUAGE)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-yc585726/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeDetermineCXXCompiler.cmake:189 (elseif):
        Policy CMP0054 is not set: Only interpret if() arguments as variables or
        keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
        details.  Use the cmake_policy command to set the policy and suppress this
        warning.

        Quoted variables like "MSVC" will no longer be dereferenced when the policy
        is set to NEW.  Since the policy is not set the OLD behavior will be used.
      Call Stack (most recent call first):
        CMakeLists.txt:4 (ENABLE_LANGUAGE)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done (11.1s)
      -- Generating done (0.0s)
      -- Build files have been written to: C:/Users/Antoine/AppData/Local/Temp/pip-install-kab8dxp_/llama-cpp-python_6aab703992964fd9953365ad8cceacea/_cmake_test_compile/build
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator - success
      --------------------------------------------------------------------------------

      Configuring Project
        Working directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build
        Command:
          'C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\cmake\data\bin/cmake.exe' 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea' -G 'Visual Studio 16 2019' '-DCMAKE_INSTALL_PREFIX:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-install' -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE '-DCMAKE_MODULE_PATH:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\resources\cmake' '-DPYTHON_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPYTHON_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' '-DPYTHON_LIBRARY:PATH=D:\Anaconda\envs\textgen\libs\python310.lib' '-DPython_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython_FIND_REGISTRY:STRING=NEVER '-DPython3_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython3_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython3_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython3_FIND_REGISTRY:STRING=NEVER -T v142 -A x64 -DCMAKE_BUILD_TYPE:STRING=Release

      -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
      -- The C compiler identification is MSVC 19.29.30148.0
      -- The CXX compiler identification is MSVC 19.29.30148.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - failed
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - broken
      CMake Error at C:/Users/Antoine/AppData/Local/Temp/pip-build-env-yc585726/overlay/Lib/site-packages/cmake/data/share/cmake-3.26/Modules/CMakeTestCCompiler.cmake:67 (message):
        The C compiler

          "C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe"

        is not able to compile a simple test program.

        It fails with the following output:

          Change Dir: C:/Users/Antoine/AppData/Local/Temp/pip-install-kab8dxp_/llama-cpp-python_6aab703992964fd9953365ad8cceacea/_skbuild/win-amd64-3.10/cmake-build/CMakeFiles/CMakeScratch/TryCompile-gyt569

          Run Build Command(s):C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/MSBuild/Current/Bin/MSBuild.exe cmTC_903a4.vcxproj /p:Configuration=Debug /p:Platform=x64 /p:VisualStudioVersion=16.0 /v:n && Microsoft (R) Build Engine version 16.11.2+f32259642 pour .NET Framework
          Copyright (C) Microsoft Corporation. Tous droits rÃ©servÃ©s.

          La gÃ©nÃ©ration a dÃ©marrÃ© 09/04/2023 16:06:07.
          Projet "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" sur le noud 1 (cibles par dÃ©faut).
          PrepareForBuild:
            CrÃ©ation du rÃ©pertoire "cmTC_903a4.dir\Debug\".
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(517,5): warning MSB8029: Le rÃ©pertoire intermÃ©diaire ou le rÃ©pertoire de sortie ne peut pas se trouver sous le rÃ©pertoire temporaire car cela risque de crÃ©er des problÃ¨mes avec la gÃ©nÃ©ration incrÃ©mentielle. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
            CrÃ©ation du rÃ©pertoire "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\Debug\".
            CrÃ©ation du rÃ©pertoire "cmTC_903a4.dir\Debug\cmTC_903a4.tlog\".
          InitializeBuildStatus:
            CrÃ©ation de "cmTC_903a4.dir\Debug\cmTC_903a4.tlog\unsuccessfulbuild", car "AlwaysCreate" a Ã©tÃ© spÃ©cifiÃ©.
          ClCompile:
            C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\CL.exe /c /Zi /W1 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D "CMAKE_INTDIR=\"Debug\"" /Gm- /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"cmTC_903a4.dir\Debug\\" /Fd"cmTC_903a4.dir\Debug\vc142.pdb" /external:W1 /Gd /TC /errorReport:queue "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\testCCompiler.c"
            Compilateur d'optimisation Microsoft (R) C/C++ versionÂ 19.29.30148 pour x64
            testCCompiler.c
            Copyright (C) Microsoft Corporation. Tous droits rÃ©servÃ©s.
            cl /c /Zi /W1 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D "CMAKE_INTDIR=\"Debug\"" /Gm- /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /Fo"cmTC_903a4.dir\Debug\\" /Fd"cmTC_903a4.dir\Debug\vc142.pdb" /external:W1 /Gd /TC /errorReport:queue "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\testCCompiler.c"
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003: Impossible d'exÃ©cuter la tÃ¢che exÃ©cutable spÃ©cifiÃ©e "CL.exe". System.IO.DirectoryNotFoundException: Impossible de trouver une partie du chemin d'accÃ¨s 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.dir\Debug\cmTC_903a4.tlog'. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.FileSystemEnumerableIterator`1.CommonInit() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.Directory.GetFiles(String path, String searchPattern) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.TrackedDependencies.ExpandWildcards(ITaskItem[] expand) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles.InternalConstruct(ITask ownerTask, ITaskItem[] tlogFiles, Boolean constructOutputsFromTLogs) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles..ctor(ITaskItem[] tlogFiles) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.CPPTasks.CL.PostExecuteTool(Int32 exitCode) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.CPPTasks.TrackedVCToolTask.ExecuteTool(String pathToTool, String responseFileCommands, String commandLineCommands) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.ToolTask.Execute() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          GÃ©nÃ©ration du projet "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" terminÃ©e (cibles par dÃ©faut) -- Ã‰CHEC.

          Ã‰CHEC de la build.

          "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" (cible par dÃ©faut) (1) ->
          (PrepareForBuild cible) ->
            C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(517,5): warning MSB8029: Le rÃ©pertoire intermÃ©diaire ou le rÃ©pertoire de sortie ne peut pas se trouver sous le rÃ©pertoire temporaire car cela risque de crÃ©er des problÃ¨mes avec la gÃ©nÃ©ration incrÃ©mentielle. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]


          "C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj" (cible par dÃ©faut) (1) ->
          (ClCompile cible) ->
            C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003: Impossible d'exÃ©cuter la tÃ¢che exÃ©cutable spÃ©cifiÃ©e "CL.exe". System.IO.DirectoryNotFoundException: Impossible de trouver une partie du chemin d'accÃ¨s 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.dir\Debug\cmTC_903a4.tlog'. [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.FileSystemEnumerableIterator`1.CommonInit() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  System.IO.Directory.GetFiles(String path, String searchPattern) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.TrackedDependencies.ExpandWildcards(ITaskItem[] expand) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles.InternalConstruct(ITask ownerTask, ITaskItem[] tlogFiles, Boolean constructOutputsFromTLogs) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.CanonicalTrackedOutputFiles..ctor(ITaskItem[] tlogFiles) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.CPPTasks.CL.PostExecuteTool(Int32 exitCode) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.CPPTasks.TrackedVCToolTask.ExecuteTool(String pathToTool, String responseFileCommands, String commandLineCommands) [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]
          C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Microsoft\VC\v160\Microsoft.CppCommon.targets(687,5): error MSB6003:    Ã  Microsoft.Build.Utilities.ToolTask.Execute() [C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build\CMakeFiles\CMakeScratch\TryCompile-gyt569\cmTC_903a4.vcxproj]

              1 Avertissement(s)
              1 Erreur(s)

          Temps Ã©coulÃ© 00:00:00.82





        CMake will not be able to correctly generate this project.
      Call Stack (most recent call first):
        CMakeLists.txt:3 (project)


      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\setuptools_wrap.py", line 634, in setup
          env = cmkr.configure(
        File "C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\cmaker.py", line 332, in configure
          raise SKBuildError(

      An error occurred while configuring with CMake.
        Command:
          'C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\cmake\data\bin/cmake.exe' 'C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea' -G 'Visual Studio 16 2019' '-DCMAKE_INSTALL_PREFIX:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-install' -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE '-DCMAKE_MODULE_PATH:PATH=C:\Users\Antoine\AppData\Local\Temp\pip-build-env-yc585726\overlay\Lib\site-packages\skbuild\resources\cmake' '-DPYTHON_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPYTHON_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' '-DPYTHON_LIBRARY:PATH=D:\Anaconda\envs\textgen\libs\python310.lib' '-DPython_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython_FIND_REGISTRY:STRING=NEVER '-DPython3_EXECUTABLE:PATH=D:\Anaconda\envs\textgen\python.exe' '-DPython3_ROOT_DIR:PATH=D:\Anaconda\envs\textgen' '-DPython3_INCLUDE_DIR:PATH=D:\Anaconda\envs\textgen\Include' -DPython3_FIND_REGISTRY:STRING=NEVER -T v142 -A x64 -DCMAKE_BUILD_TYPE:STRING=Release
        Source directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea
        Working directory:
          C:\Users\Antoine\AppData\Local\Temp\pip-install-kab8dxp_\llama-cpp-python_6aab703992964fd9953365ad8cceacea\_skbuild\win-amd64-3.10\cmake-build
      Please see CMake's output for more information.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Implement caching for evaluated prompts

The goal of this feature is to reduce latency for repeated calls to the chat_completion api by saving the kv_cache keyed by the prompt tokens.

The basic version of this is to simply save the kv_state after the prompt is generated.

Additionally we should investigate if it's possible save and restore the kv_state after the completion has been generated as well.

Add `/models/{model}` endpoint

Possible Typo in fastapi_server.py: n_ctx vs n_batch

When running the server in fastapi_server.py I noticed a possible typo in the configuration of the llama_cpp.Llama instance.

Here is the relevant code:

llama = llama_cpp.Llama(
    settings.model,
    f16_kv=True,
    use_mlock=True,
    embedding=True,
    n_threads=6,
    n_batch=2048,     <--- Should be n_ctx=2048
)

It appears that n_batch is set to 2048, but I believe it might be intended to set n_ctx to 2048 instead. When I tried to run the code as is, I encountered an exception due to the assert for ctx being None during generation. Changing n_batch to n_ctx resolved the issue.

Also, default batch size is 8, so 2048 seems a bit high :)

Awesome job!!!

https://github.com/abetlen/llama-cpp-python/blob/b9a4513363267dcc1f4b77d709ac3333fc889c6e/examples/fastapi_server.py#LL36C5-L36C5

hi, I need help:

Hello, I need help, I am a very beginner programmer and I understand almost nothing, but could someone explain to me step by step how to use and execute the AI? I don't understand how to do it

Shared library with base name 'llama' not found, windows

can someone advise me on this issue, windows
from .llama_cpp import * File "C:\Users\usr1\Anaconda3\envs\chatgpt1\lib\site-packages\llama_cpp\llama_cpp.py", line 46, in
_lib = _load_shared_library(_lib_base_name)
File "C:\Users\moham\Anaconda3\envs\chatgpt1\lib\site-packages\llama_cpp\llama_cpp.py", line 40, in _load_shared_library
raise FileNotFoundError(f"Shared library with base name '{lib_base_name}' not found")
FileNotFoundError: Shared library with base name 'llama' not found

Kudos on a great job! Need a little help with BLAS

Let me first congratulate everyone working on this for:

Python bindings for llama.cpp
Making them compatible with openai's api
Superb documentation!

Was wondering if anyone can help me get this working with BLAS? Right now when the model loads, I see BLAS=0.
I've been using kobold.cpp, and they have a BLAS flag at compile time which enables BLAS. It cuts down the prompt loading time by 3-4X. This is a major factor in handling longer prompts and chat-style messages.

P.S - Was also wondering what the difference is between create_embedding(input) and embed(input)?

Running server gives error when using huggingface model

850 return issubclass(cls.origin, self.origin)
851 if not isinstance(cls, _GenericAlias):
--> 852 return issubclass(cls, self.origin)
853 return super().subclasscheck(cls)

TypeError: issubclass() arg 1 must be a class

The manual package works with the model not huggingface

https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/resolve/main/ggml-vicuna-13b-4bit-rev1.bin

Add support for tokenize / detokenize

Add additional tests for `stop` sequences

Stop sequence implementation is currently a little complicated due to needing to support streaming. Also behavior is ill-defined.

Error running on M1 Mac

Hi!

I am having issues with using it on a M1 Mac:

from llama_cpp import Llama
produces this error:

zsh: illegal hardware instruction

Best,
Benjamin

Chat does not remember initial prompt as well as llama.cpp

When feeding llama.cpp main app with an initial prompt from a file (--file parameter) like this:

./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct --file 'prompt.txt' --keep -1 --n_predict -1 --mlock -m models/ggml-vicuna-13b-4bit.bin

... it remebers the character name that is in that file for quite a long time. But it seams that when using low_level_api_chat_cpp.py like this:

python3 low_level_api_chat_cpp.py --mlock --color --interactive-first --interactive-start -r "### Human:" -ins -c 2048 -i --repeat_penalty 1.2 --temp 0 --n_parts -1 --ignore-eos --keep -1 --n_predict -1 --file '/Users/admin/scripts/llama-cpp-python/examples/low_level_api/prompt.txt' -m ../../llama.cpp/models/ggml-vicuna-13b-4bit.bin

it forgets the name quicker, even if I use the keep -1 parameter. It seams to be worse at remembering context overall.

Any way to make it remember context for longer as in llama.cpp main app?

Support pickling the `Llama` instance

As pointed out by here, the Llama class cannot currently be pickled because it has pointers to C memory addresses. To implement this we'll need to write custom __getstate__ and / or __reduce__ methods for pickling as well as a __setstate__ methods for unpickling

References

Test other operating systems in Github workflow

Installation on Windows failed because Visual Studio is not installed

Trying to install with

pip install llama-cpp-python==0.1.23

on Windows in a micromamba environment resulted in the following error. It seems like the package is looking for Visual Studio, which is not installed on my system.

Is it possible to make it such that the package can be installed without the need for Visual Studio?

(C:\Users\me\Downloads\oobabooga-windows\oobabooga-windows\installer_files\env) C:\Users\me\Downloads\oobabooga-windows\oobabooga-windows\text-generation-webui>pip install llama-cpp-python==0.1.23
Collecting llama-cpp-python==0.1.23
  Downloading llama_cpp_python-0.1.23.tar.gz (530 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 530.0/530.0 kB 504.2 kB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in c:\users\me\downloads\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages (from llama-cpp-python==0.1.23) (4.5.0)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [308 lines of output]


      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 17 2022 x64 v143)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 17 2022

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 17 2022 x64 v143' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 16 2019 x64 v142)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 16 2019

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 16 2019 x64 v142' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      -- The C compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja (Visual Studio 15 2017 x64 v141)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'Visual Studio 15 2017 x64 v141' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Generator

          Visual Studio 15 2017

        could not find any instance of Visual Studio.



      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Visual Studio 15 2017 x64 v141' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Running

         'nmake' '-?'

        failed with:

         The system cannot find the file specified


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 17 2022 x64 v143)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Running

         'nmake' '-?'

        failed with:

         The system cannot find the file specified


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 16 2019 x64 v142)' generator - failure
      --------------------------------------------------------------------------------



      --------------------------------------------------------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      Not searching for unused variables given on the command line.
      CMake Error at CMakeLists.txt:2 (PROJECT):
        Running

         'nmake' '-?'

        failed with:

         The system cannot find the file specified


      -- Configuring incomplete, errors occurred!
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'NMake Makefiles (Visual Studio 15 2017 x64 v141)' generator - failure
      --------------------------------------------------------------------------------

      ********************************************************************************
      scikit-build could not get a working generator for your system. Aborting build.

      Building windows wheels for Python 3.10 requires Microsoft Visual Studio 2022.
      Get it with "Visual Studio 2017":

        https://visualstudio.microsoft.com/vs/

      Or with "Visual Studio 2019":

          https://visualstudio.microsoft.com/vs/

      Or with "Visual Studio 2022":

          https://visualstudio.microsoft.com/vs/

      ********************************************************************************
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Implement `logprobs`

logprobs return format should match OpenAI API. Currently calling a Llama instance with logprobs enabled just returns a list of floats.

Example of the correct format:

"logprobs": {
    "text_offset": [
        11,
        12,
        13,
        14,
        15,
        17,
        18,
        20,
        21,
        23,
        24,
        26,
        27,
        29,
        30,
        32
    ],
    "token_logprobs": [
        -0.028534053,
        -0.0013638621,
        -0.0001191709,
        -0.037809037,
        -0.008346983,
        -1.3900239e-05,
        -6.0395385e-05,
        -2.462996e-05,
        -5.4432137e-05,
        -4.3108244e-05,
        -6.0395385e-05,
        -4.382537e-05,
        -4.489638e-05,
        -4.751897e-05,
        -0.00017937786,
        -7.314978e-05
    ],
    "tokens": [
        "\n",
        "\n",
        "1",
        ",",
        " 2",
        ",",
        " 3",
        ",",
        " 4",
        ",",
        " 5",
        ",",
        " 6",
        ",",
        " 7",
        ","
    ],
    "top_logprobs": [
        {
            "\n": -0.028534053,
            "\n\n": -5.3414392,
            " (": -6.8118296,
            " in": -4.9322805,
            ":": -5.6061873
        },
        {
            "\n": -0.0013638621,
            " \u00a7\u00a7": -8.594428,
            "//": -9.296644,
            "1": -9.727121,
            "Count": -9.291412
        },
        {
            " 1": -10.996209,
            "\"": -12.673454,
            "#": -12.253096,
            "1": -0.0001191709,
            "One": -9.39247
        },
        {
            " -": -6.4947214,
            " 2": -7.7675867,
            ")": -8.327954,
            ",": -0.037809037,
            ".": -3.3655276
        },
        {
            "\n": -14.826643,
            " ": -10.675518,
            " 2": -0.008346983,
            " two": -16.126537,
            "2": -4.792885
        },
        {
            " ,": -11.469002,
            " 3": -12.7872095,
            ",": -1.3900239e-05,
            ".": -14.724538,
            "<|endoftext|>": -15.308233
        },
        {
            " ": -12.118958,
            " 3": -6.0395385e-05,
            " three": -17.906118,
            "3": -9.814757,
            "<|endoftext|>": -15.049129
        },
        {
            " ,": -10.729593,
            " 4": -14.016008,
            ",": -2.462996e-05,
            ".": -14.297305,
            "<|endoftext|>": -13.67176
        },
        {
            " ": -11.351273,
            " 4": -5.4432137e-05,
            "4": -10.086686,
            "<|endoftext|>": -13.919009,
            "\u00a0": -16.80569
        },
        {
            " ,": -10.206355,
            " 5": -12.87644,
            ",": -4.3108244e-05,
            ".": -13.588498,
            "<|endoftext|>": -13.03574
        },
        {
            " ": -11.478045,
            " 5": -6.0395385e-05,
            "5": -9.931537,
            "<|endoftext|>": -13.568035,
            "\u00a0": -16.266188
        },
        {
            " ,": -10.160495,
            " 6": -12.964705,
            ",": -4.382537e-05,
            ".": -14.101328,
            "<|endoftext|>": -13.08568
        },
        {
            " ": -11.344849,
            " 6": -4.489638e-05,
            "6": -10.329956,
            "<|endoftext|>": -14.879237,
            "\u00a0": -16.98358
        },
        {
            " ,": -10.096309,
            " 7": -12.389179,
            ",": -4.751897e-05,
            ".": -13.817777,
            "<|endoftext|>": -13.860558
        },
        {
            " ": -11.630913,
            " 7": -0.00017937786,
            " seven": -16.613815,
            "7": -8.680304,
            "<|endoftext|>": -14.859097
        },
        {
            " ,": -9.754253,
            " 8": -11.516983,
            ",": -7.314978e-05,
            ".": -13.250221,
            "<|endoftext|>": -12.703088
        }
    ]
}

Standalone Server

Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g. llama-cpp-python[server] or something like that.

Work that needs to be done first:

Ensure compatibility with OpenAI
- Response objects match
- Request objects match
- Loaded model appears under /v1/models endpoint
- ~~Test OpenAI client libraries~~
- Unsupported parameters should be silently ignored
Ease-of-use
- Integrate server as a subpackage
- CLI tool to run the server

Future work

Prompt caching to improve latency
Support multiple models in the same server
Add tokenization endpoints to make it easier to make it easier for small clients to calculate context window sizes

Test other package managers in Github workflow

Installation on Windows failed in building wheel,UnicodeDecodeError

Running pip install llama-cpp-python==0.1.23

Collecting llama-cpp-python==0.1.23
  Using cached llama_cpp_python-0.1.23.tar.gz (530 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "I:\Python\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "I:\Python\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "I:\Python\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\31415\AppData\Local\Temp\pip-build-env-2r2v_z25\overlay\Lib\site-packages\setuptools\build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "C:\Users\31415\AppData\Local\Temp\pip-build-env-2r2v_z25\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "C:\Users\31415\AppData\Local\Temp\pip-build-env-2r2v_z25\overlay\Lib\site-packages\setuptools\build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 6, in <module>
        File "I:\Python\lib\pathlib.py", line 1135, in read_text
          return f.read()
      UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 4: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

[Feature] Dynamic Model Loading and Model Endpoint in FastAPI

I'd like to propose a future feature I think would add useful flexibility for users of the completions/embeddings API . I'm suggesting the ability to dynamically load models based on calls to the FastAPI endpoint.

The concept is as follows:

Have a predefined location for model files (e.g., a models folder within the project) and allow users to specify an additional model folder if needed.
When the API starts, it checks the designated model folders and populates the available models dynamically.
Users can query the available models through a GET request to the /v1/engines endpoint , which would return a list of models and their statuses.
Users can then specify the desired model when making inference requests.

This dynamic model loading feature would align with the behavior of the OpenAI spec for models and model status. It would offer users the flexibility to easily choose and use different models without having make manual changes to the project or configs.

This is a suggestion for later, but I wanted to suggest it now so we can plan if we do decide to implement it.

Let me know your thoughts :)

Thread bug in server code

For https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/server/__main__.py#L202
This line of code will actually block https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/server/__main__.py#L226 for sending heartbeat signal. Although async is used, but the main thread is blocked on executing create_chat_completion. If model is large and first message take a long time to send, network connection will drop

[WinError 193] When trying to run the high level API example with vicuna

I ran pip install llama-cpp-python and the installation was a success, then I created a python file and copied over the example text in the readme.
The only change I made was the model path to the vicuna model I am using and when I try to run the script I end up getting this error:

  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama_cpp.py", line 36, in _load_shared_library
    return ctypes.CDLL(str(_lib_path))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 193] %1 is not a valid Win32 application

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Chula\Desktop\code_projects\Vicuna_1\test.py", line 1, in <module>
    from llama_cpp import Llama
  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama_cpp.py", line 46, in <module>
    _lib = _load_shared_library(_lib_base_name)
  File "C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama_cpp.py", line 38, in _load_shared_library
    raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
RuntimeError: Failed to load shared library 'C:\Users\Chula\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\llama_cpp\llama.dll': [WinError 193] %1 is not a valid Win32 application
Press any key to continue . . .

I'm fairly new to all this so may have done something wrong but can seem to find a fix anywhere.

	def llama_n_embd(ctx: llama_context_p) -> c_int:
	return _lib.llama_n_ctx(ctx)