Comments (7)
I will take a look into detokenization incrementally for PyTorch backend.
from openllm.
Hi there, thanks for creating the issue.
Do you have vllm available locally?
from openllm.
Hi
I'm still not able to run this model with vllm backend due to insufficient gpu mem (T4 16g seems not enough)
After some research, I think the root cause of this might be a single complete chinese character may be decoded from multiple token outputs. So decoding to text on every generate iteration is not feasible for Chinese.
from openllm.
Sounds like a orthogonal issue from OpenLLM?
from openllm.
For pytorch backend, it is related to OpenLLM in the implementation of PyTorchRunnable. It might need some way to detect incomplete character on each generation, probably something like what text-generation-inference server had here
OR transformers TextStreamer done here
If vllm backend has this handled, then OpenLLM will be doing fine. But I'm not able to verify it at the moment.
from openllm.
Tried to fix the problem with the text-generation-inference server approach (Related issue: huggingface/text-generation-inference#333)
Please have a look, thanks!
from openllm.
For pytorch backend, it is related to OpenLLM in the implementation of PyTorchRunnable. It might need some way to detect incomplete character on each generation, probably something like what text-generation-inference server had here OR transformers TextStreamer done here
If vllm backend has this handled, then OpenLLM will be doing fine. But I'm not able to verify it at the moment.
FYI, found that vllm had also fix this issue with text-generation-inference approach in this pr vllm-project/vllm#984
from openllm.
Related Issues (20)
- README outdated? HOT 5
- bug: fail to start service in quickstarts HOT 1
- Inference Speed comparison HOT 1
- how to stop generation stream? HOT 3
- bug: TypeError: attribute name must be string, not 'NoneType' HOT 5
- who can give me the openllm request and response format? HOT 1
- bug: cannot load local model by model-id
- bug: microsoft/phi-2 hangs on macos i7
- bug: Linux Mint: The Service throws errors when getting model-requests HOT 1
- Docker image for version 0.4.41 not built
- bug: when use backend=pt in normal or adapter mode,RuntimeError: Exception caught during generation: Failed to parse JSON from SSE message:(vllm is ok) HOT 2
- bug: 'torch.dtype' object has no attribute 'lower'
- running out of memory
- bug: Inference fails for `docker run` with `google/flan-t5-large` model with `pytorch` on CPU HOT 1
- bug: block GPU HOT 1
- Add a gradio demo for the python sdk section
- Why change prompt_token_ids depending on encoder_decoder
- bug: RuntimeError: Exception caught during generation: Response payload is not completed
- bug: [WARNING] [api_server:llm-llama-service:3] Timed out waiting for runner to be ready
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openllm.