Issue deion Tokenization via LlamaContext.encode (or model.t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Inconsistent tokenization/encoding about node-llama-cpp HOT 3 CLOSED

StrangeBytesDev commented on September 28, 2024 1

Inconsistent tokenization/encoding

from node-llama-cpp.

Comments (3)

giladgd commented on September 28, 2024 1

@StrangeBytesDev You can pass to the generateCompletion function an array of tokens instead of a string - this way you can tokenize the input however you want

from node-llama-cpp.

giladgd commented on September 28, 2024

@StrangeBytesDev This issue was already fixed in version 3 beta.

Using the version 3 beta, to tokenize an input with special tokens you should enable the specialTokens parameter:

import {fileURLToPath} from "url";
import path from "path";
import {getLlama} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "functionary-small-v2.2.q4_0.gguf")
});

const text = "<|from|>user\n<|content|>Hello";

console.log("With special tokens:", model.tokenize(text, true));
console.log("Without special tokens:", model.tokenize(text));

from node-llama-cpp.

StrangeBytesDev commented on September 28, 2024

Oh awesome, I totally missed that. I like that its available optionally. I don't think I've seen any other library or API that has it as an option, and I can see some use cases where it would useful to have both.
I'm having a bit of a hard time getting my head around how the tokenization in the generateCompletion function is handled. I'm under the impression that there isn't a way to enable the specialTokens param from a completion currently. Is that the case?

from node-llama-cpp.

Recommend Projects

Inconsistent tokenization/encoding about node-llama-cpp HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent