Hello, I am using the llama.cpp server and noticed strange behavior

Thank you, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I just opened <a class="issue-link js-issue-link" data-error-text="Failed

Server: completion_probabilities (tok_str and prob) seem to be broken about llama.cpp HOT 8 CLOSED

reuank commented on September 24, 2024

Server: completion_probabilities (tok_str and prob) seem to be broken

from llama.cpp.

Comments (8)

ggerganov commented on September 24, 2024 1

Yes, it's confusing. Should improve this - PRs welcome

from llama.cpp.

ggerganov commented on September 24, 2024

When temperature == 0.0f we don't compute probabilities. You can set temperature < 0.0f and it should work as expected

from llama.cpp.

reuank commented on September 24, 2024

Thank you, @ggerganov. I just saw that the change happened in af0a5b6.

It is, however, a bit strange that the tokens for temperature == 0.0f are the first n tokens of the vocab, with the first token getting a prob of 1.0. Maybe it would be more intuitive to return an error, or to leave the probs in the completion_probabilities empty in this case?

from llama.cpp.

JohannesGaessler commented on September 24, 2024

Sorry, I must have made a mistake while testing.

It is, however, a bit strange that the tokens for temperature == 0.0f are the first n tokens of the vocab, with the first token getting a prob of 1.0. Maybe it would be more intuitive to return an error, or to leave the probs in the completion_probabilities empty in this case?

The intended behavior is that with temperature 0 the top tokens are still being returned but that all tokens other than the top one have 0 probability. Essentially you should be getting the same thing as with a temperature of 0.001.

from llama.cpp.

reuank commented on September 24, 2024

@JohannesGaessler, I see.
I just opened #7202, but that is obsolete then, right?

from llama.cpp.

reuank commented on September 24, 2024

I don't understand why the top token should have 100% probability, and the others 0% probability assigned. I mean their appearance and position in the top n tokens is defined by their respective logits. I would expect the server response to reflect the actual model output.

Is the reason for your approach that you save the softmax calculation?

from llama.cpp.

JohannesGaessler commented on September 24, 2024

I don't understand why the top token should have 100% probability, and the others 0% probability assigned.

If you sample with 0 temperature that are simple the probabilities with which the tokens are sampled. It is the correct way to continue the probabilities as the temperature goes towards 0, you would have discontinuities otherwise. Internally llama.cpp does not calculate logits at all with temperature == 0.0f, hence the need to manually set the values.

For temperatures < 0.0f the tokens are also sampled greedily but the backend still calculates token probabilities as you would get them with 1.0 temperature and no other samplers.

Is the reason for your approach that you save the softmax calculation?

It saves you not just the softmax but all other samplers as well.

from llama.cpp.

JohannesGaessler commented on September 24, 2024

I just opened #7202, but that is obsolete then, right?

It was a five minute fix so I also opened a PR to yield the intended behavior: #7203 . Varying the number of returned tokens is I think not a good solution because it leads to weird behavior for temperatures that are almost but not quite 0. With temperature 0 and temperature $10^{-10}$ you would get different behavior even though it is both in effect greedy sampling.

from llama.cpp.

Server: completion_probabilities (tok_str and prob) seem to be broken about llama.cpp HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent