You can see this in my PR where I tried to add the 'alpca' chat template: <p dir="

Possible (very serious) bug in chat templates that use '<s>' token having a space added after it about llama.cpp HOT 2 CLOSED

jukofyork commented on June 30, 2024 1

Possible (very serious) bug in chat templates that use '' token having a space added after it
from llama.cpp.

Comments (2)

jukofyork commented on June 30, 2024 1

It was actually the extra </s>' that caused phind-codellama` to go wrong, so closing this/
from llama.cpp.

jukofyork commented on June 30, 2024

I'm not even sure if it is the space that is causing it now, as this is the tokenization when using the deepseek chat template:

{"tid":"140097807843328","timestamp":1716135105,"level":"VERB","function":"update_slots","line":1955,"msg":"prompt tokenized","id_slot":0,"id_task":0,"n_ctx":16384,"n_keep":0,"n_prompt_tokens":43,"prompt_tokens":"<s> ### Instruction:\nCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values\n### Response:\n"}

that produces "sane" output from phind-codellama.
from llama.cpp.

Related Issues (20)

Feature Request: It would be convenient and faster if users could specify that the model data used for a RPC-server instance is already available by some fast(er) means (file system GGUF, whatever). HOT 1

Bug: Crash with GGML CUDA error when inferencing on llama-server HOT 9

Bug: convert-hf-to-gguf.py - AttributeError: 'LlamaTokenizerFast' object has no attribute 'added_tokens_decoder' HOT 1

Bug: llama3 8b gradient unsupported? HOT 2

Bug: Missing required key: general.description

Bug: After running for a while, the llama-server exhibits extremely high CPU usage, resulting in timeouts for all requests.

Bug: converting model from HF to GGUF gives error HOT 1

Bug: infill reference crashed HOT 6

Bug: Cannot quantize a model to BF16 due to an overflow in gguf/quants.py HOT 5

Bug: llama-cli templating does buf.resize(-1) if the model's template is not supported, causing crash HOT 2

Request: Add support for Qwen2 Embedding model: Alibaba-NLP/gte-Qwen2-7B-instructFeature

Bug: error loading model: llama_model_loader: failed to load model HOT 1

Bug: `cmake -B build -DLLAMA_CUDA=ON -DLLAMA_NATIVE=ON` gives "deprecated" warning but then compiled CPU-only verrsion HOT 4

Bug: llama.cpp binaries are compiled dynamically and the library is missing! HOT 4

Bug: on AMD gpu, it offloads all the work to the CPU unless you specify --n-gpu-layers on the llama-cli command line HOT 18

Bug: GGML can no longer be statically linked to llama.cpp due to the source code reorganization HOT 9

Feature Request: Why is there no pre-compiled Windows version of AMD ROCm? HOT 2

Feature Request: Why is there no pre-compiled Windows version of AMD ROCm? HOT 1

Bug: Cannot load DeepSeek-Coder-V2-Instruct HOT 11

Bug: llama-infill segmentation fault if missing --in-suffix

Possible (very serious) bug in chat templates that use '<s>' token having a space added after it about llama.cpp HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent