Comments (8)
Yes, it's confusing. Should improve this - PRs welcome
from llama.cpp.
When temperature == 0.0f
we don't compute probabilities. You can set temperature < 0.0f
and it should work as expected
from llama.cpp.
Thank you, @ggerganov. I just saw that the change happened in af0a5b6.
It is, however, a bit strange that the tokens for temperature == 0.0f
are the first n tokens of the vocab, with the first token getting a prob of 1.0
. Maybe it would be more intuitive to return an error, or to leave the probs
in the completion_probabilities
empty in this case?
from llama.cpp.
Sorry, I must have made a mistake while testing.
It is, however, a bit strange that the tokens for temperature == 0.0f are the first n tokens of the vocab, with the first token getting a prob of 1.0. Maybe it would be more intuitive to return an error, or to leave the probs in the completion_probabilities empty in this case?
The intended behavior is that with temperature 0 the top tokens are still being returned but that all tokens other than the top one have 0 probability. Essentially you should be getting the same thing as with a temperature of 0.001.
from llama.cpp.
@JohannesGaessler, I see.
I just opened #7202, but that is obsolete then, right?
from llama.cpp.
I don't understand why the top token should have 100% probability, and the others 0% probability assigned. I mean their appearance and position in the top n tokens is defined by their respective logits. I would expect the server response to reflect the actual model output.
Is the reason for your approach that you save the softmax calculation?
from llama.cpp.
I don't understand why the top token should have 100% probability, and the others 0% probability assigned.
If you sample with 0 temperature that are simple the probabilities with which the tokens are sampled. It is the correct way to continue the probabilities as the temperature goes towards 0, you would have discontinuities otherwise. Internally llama.cpp does not calculate logits at all with temperature == 0.0f, hence the need to manually set the values.
For temperatures < 0.0f the tokens are also sampled greedily but the backend still calculates token probabilities as you would get them with 1.0 temperature and no other samplers.
Is the reason for your approach that you save the softmax calculation?
It saves you not just the softmax but all other samplers as well.
from llama.cpp.
I just opened #7202, but that is obsolete then, right?
It was a five minute fix so I also opened a PR to yield the intended behavior: #7203 . Varying the number of returned tokens is I think not a good solution because it leads to weird behavior for temperatures that are almost but not quite 0. With temperature 0 and temperature
from llama.cpp.
Related Issues (20)
- Feature Request: Priority for RPC servers HOT 2
- Bug: cpu_set_t is undefined in specific Android Archs, making compilation impossible HOT 2
- Reflection-70B quantize error: Llama 3 must be converted with BpeVocab HOT 2
- Bug: rpc-server segment fault when running with no kv cache offloading HOT 3
- Bug: GPU acceleration deosn't open on Windows HOT 4
- Bug: Segmentation fault (core dumped) HOT 3
- Bug: llama-cli prompt eval time calculation
- Bug: llama-server crashing after refactor sampling v2 pull HOT 6
- Bug: Unable to quantise Uncensored Mistral NeMo Model HOT 1
- Bug: broken llama-imatrix arg parser
- Bug: generate at most 400 tokens. HOT 2
- llama : refactor llama_vocab HOT 1
- Bug: Compilation failure with CUDA support on Windows: 1 error detected in the compilation of ggml-cuda/sum.cu. HOT 6
- Bug: [SYCL] libiomp5md.dll missing from release package
- Add support for OLMoE-1B-7B / 7B
- Bug: When using KV quantization, if a context shift occurs, the application will crash. HOT 3
- Bug: Interactive Mode Issues: Token Limit Ignored and Ctrl+C Not Working in WSL Environment
- server : ability to disable context shift HOT 1
- Bug: cannot create std::vector larger than max_size() HOT 10
- Bug: docker GGML_CUDA=1 make [on llama-gen-docs] fails since arg refactor HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.