Comments (5)
did you mount the GPU to the container?
from openllm.
Ok, when I'm runned it from powershell with --gpus all
property (instead of GUI), this error was gone, but heres another:
ValueError: No available memory for the cache blocks. Try increasing
gpu_memory_utilization
when initializing the engine.
Is it possible that this model will not work with this VGA?
from openllm.
You can set --gpu-memory-utlization to 0.5
from openllm.
I'm trying, with this:
ENTRYPOINT openllm start HuggingFaceH4/zephyr-7b-alpha --backend vllm --gpu-memory-utilization 0.5
But it cause the following error:
openllm_core.exceptions.OpenLLMException: Failed to initialise vLLMEngine due to the following error:
CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
Then:
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 109, in init
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Process 481 has 17179869184.00 GiB memory in use. Of the allocated memory 12.74 GiB is allocated by PyTorch, and 13.52 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from openllm.
I've connected another RTX 3070 to my PC, and the memory issue has been resolved. However, a new problem has emerged, for which I am opening a new issue. I am closing this one since the original problem has been solved.
from openllm.
Related Issues (20)
- bug: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" }) HOT 2
- bug: Model is not found in BentoML store, you may need to run bentoml models pull first HOT 2
- /v1/chat/completions endpoint not responding - ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster. HOT 7
- feat: Avoid downloading the same model twice when only backend is different
- bug: Error While loading Mistral and Llama on T4 GPU HOT 6
- bug: Response Payload not generated HOT 3
- feat: Function Calling Support HOT 1
- running openllm start opt or falcon for the first time fails HOT 12
- feat(openai): chat completions logprobs
- RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' HOT 1
- README outdated? HOT 5
- bug: fail to start service in quickstarts HOT 1
- Inference Speed comparison HOT 1
- how to stop generation stream? HOT 3
- bug: TypeError: attribute name must be string, not 'NoneType' HOT 5
- who can give me the openllm request and response format? HOT 1
- bug: cannot load local model by model-id
- bug: Output text from CompletionChunk is different with tokenizer.decode HOT 7
- bug: microsoft/phi-2 hangs on macos i7
- bug: Linux Mint: The Service throws errors when getting model-requests HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openllm.