Feature Request: Nemotron-4-340B-Instruct Support about llama.cpp HOT 2 OPEN

rankaiyx commented on August 16, 2024 3

Feature Request: Nemotron-4-340B-Instruct Support

from llama.cpp.

Comments (2)

rankaiyx commented on August 16, 2024 1

According to the fact that the Q4 quantized 34B model requires 20g RAM,
the Q4 quantized 340B model should be able to run on a computer with 256G RAM.

from llama.cpp.

Yorizuka commented on August 16, 2024

Well even if its not something that most can run at home, it would still be really useful for people who can deploy it. Big GPUs can be rented in the cloud. This model feels to me like its going to be a game changer!

llama.cpp is simply the least headache inducing a way of running any LLM, Renting for this model is going to be expensive and not having to fiddle with jank is nice. I also wonder how well the AMD MI300x would be.

from llama.cpp.

Related Issues (20)

Bug: Building through oneAPI compilers on Windows failed.
Bug: Phi-3 Tokenizer Adds Whitespaces on re-tokenization (which invalidates KV-cache) HOT 8
Bug: Weird output from CodeQwen converted from safetensors and unrecognized BPE pre-tokenizer for CodeQwen HOT 4
examples/server: "New UI" chat becomes slower with each subsequent message
nvm
How to properly serve Gemma 7b? HOT 2
Refactor: GGUF my Repo tool on HF needs its scripts updated with the new naming scheme HOT 1
Facing issue while converting finetune LLaVA Mistral model to gguf HOT 1
Error converting gemma-1.1-7b-it to gguf. HOT 2
Latest vulkan version doesn't follow instruction HOT 1
Bug: Unable to load model using SYCL HOT 4
Bug: b3028 breaks mixtral 8x22b HOT 19
Bug: The output of the lama-clI is not the same as the output of the lama-server HOT 4
Bug: -[MTLComputePipelineDescriptorInternal setComputeFunction:withType:]:722: failed assertion `computeFunction must not be nil.' HOT 6
Bug: Vulkan, I-quants partially working since PR #6210 (very slow, only with all repeating layers offloaded) HOT 1
Bug: Unable to call llama.cpp inference server with llama 3 model HOT 1
Bug: Deepseek Coder MOE GGML_ASSERT: ggml.c:5705: ggml_nelements(a) == ne0*ne1 HOT 10
SIMD Everywhere HOT 1
Bug: Llama3 8B Instruct Model outputting nonsensical text on AMD GPUs. HOT 2

Feature Request: Nemotron-4-340B-Instruct Support about llama.cpp HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent