Comments (5)
falcon requires a lot of resource to run, even during inference.
This has to do with the model having to compute all of the matrices through the attention layer.
On a 4 A10G, The average latency I'm seeing is around 140s
from openllm.
Hey,
Yes, I know, but in my case I do not think it is a resource problem. It not about the response time, it is not responding at all.
With paperspace I created a dedicated GPU instance A100 GPU instance.
With 12 CPU, 90GB Memory, without any additional services running on this instance.
That's why I thought the problem is in the logs:
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
from openllm.
Got it, i will take a look
from openllm.
I was only able to run Falcon on g5.24xlarge, which has 96GB GPU mem, 384GB ram :)
from openllm.
Wow
Okay, I will give it a try
Thanks!
from openllm.
Related Issues (20)
- feat: Expose vllm max-model-len parameter to avoid OOM issues with AWQ quantized models using vllm HOT 1
- Repetitive non-fatal ConflictError: "arbiter is already running %s command" HOT 3
- Output from OpenLLM is different with HuggingFace Transformers HOT 8
- bug: error with openllm start HOT 1
- bug: When running by example getting error: TypeError: 'dict' object is not callable HOT 6
- bug: AttributeError: can't set attribute 'eos_token' HOT 1
- bug: Failed to run on specified gpus HOT 5
- bug: TypeError: getattr(): attribute name must be string HOT 2
- bug: Chat template is not applied HOT 1
- bug: HOT 3
- bug: openllm cannot start HOT 6
- infra: Tests plan
- feat: embedding HOT 3
- RuntimeError: Found no NVIDIA driver on your system. HOT 5
- feat: Docker image for ARM64 / aarch64 HOT 4
- Is
- bug: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" }) HOT 2
- bug: Model is not found in BentoML store, you may need to run bentoml models pull first HOT 2
- /v1/chat/completions endpoint not responding - ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster. HOT 7
- feat: Avoid downloading the same model twice when only backend is different
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openllm.