Describe the bug I'm recently trying to use a fine-tuned version o

bug: Output text from CompletionChunk is different with tokenizer.decode about openllm HOT 7 OPEN

jeffwang0516 commented on May 30, 2024

bug: Output text from CompletionChunk is different with tokenizer.decode

from openllm.

Comments (7)

aarnphm commented on May 30, 2024 1

I will take a look into detokenization incrementally for PyTorch backend.

from openllm.

aarnphm commented on May 30, 2024

Hi there, thanks for creating the issue.

Do you have vllm available locally?

from openllm.

jeffwang0516 commented on May 30, 2024

I'm still not able to run this model with vllm backend due to insufficient gpu mem (T4 16g seems not enough)

After some research, I think the root cause of this might be a single complete chinese character may be decoded from multiple token outputs. So decoding to text on every generate iteration is not feasible for Chinese.

from openllm.

aarnphm commented on May 30, 2024

Sounds like a orthogonal issue from OpenLLM?

from openllm.

jeffwang0516 commented on May 30, 2024

For pytorch backend, it is related to OpenLLM in the implementation of PyTorchRunnable. It might need some way to detect incomplete character on each generation, probably something like what text-generation-inference server had here
OR transformers TextStreamer done here

If vllm backend has this handled, then OpenLLM will be doing fine. But I'm not able to verify it at the moment.

from openllm.

jeffwang0516 commented on May 30, 2024

Tried to fix the problem with the text-generation-inference server approach (Related issue: huggingface/text-generation-inference#333)
Please have a look, thanks!

from openllm.

jeffwang0516 commented on May 30, 2024

For pytorch backend, it is related to OpenLLM in the implementation of PyTorchRunnable. It might need some way to detect incomplete character on each generation, probably something like what text-generation-inference server had here OR transformers TextStreamer done here

If vllm backend has this handled, then OpenLLM will be doing fine. But I'm not able to verify it at the moment.

FYI, found that vllm had also fix this issue with text-generation-inference approach in this pr vllm-project/vllm#984

from openllm.

Recommend Projects

bug: Output text from CompletionChunk is different with tokenizer.decode about openllm HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent