Later this week I will be receiving my test configuration for CPU only mode. I will be

CPU Only Optimization about willow-inference-server HOT 5 CLOSED

toverainc commented on June 8, 2024

CPU Only Optimization

from willow-inference-server.

Comments (5)

hamishcunningham commented on June 8, 2024

What machine spec did you choose for the test rig? (One of my students wants to demo willow at a conference session in a couple of weeks, and wondering whether to use CPU or GPU...)

from willow-inference-server.

kristiankielhofner commented on June 8, 2024

That's awesome!

GPU - hands down.

Even if you go with something like a Tesla P4 (lowest cost, lowest power, single slot, passive cooling) or a GTX 1070 it can do most voice command length speech segments at 5x realtime (at least). An RTX 4090 (nice!) is 45x! CPU is... Not that.

As long as the CPU isn't terrible it really doesn't matter as much performance wise when using GPU. Of course there is some variation but by far the most complex and performance-intensive tasks in WIS are offloaded to GPU.

from willow-inference-server.

hamishcunningham commented on June 8, 2024

Thanks, I've got a GTX 1070 rig running well, but wondered what config you were planning on working with for CPU. I'd like to experiment with Vicuna too but guess I currently need more than the 8 GB VRAM?

from willow-inference-server.

kristiankielhofner commented on June 8, 2024

Our CPUs are all over the place - from 6-7 year old intel i[something], ten year old Xeons, AMD Ryzens, to AMD ThreadRippers, etc. I'm hesitant to recommend specific CPUs because there's so much variety and it starts to get into things like types of RAM, etc. There's so much more variation in potential system hardware outside of GPU. My general take is: recent-ish AMDs (Ryzen something, etc) are much better with power and have excellent performance, otherwise anything will work - even REALLY old CPUs that don't even have AVX, etc (if using GPU). When it comes to CPU WIS really isn't any different than any other application. I would just take what you already know/have experienced with CPUs and apply that to WIS - older CPUs are slower, consume more power, etc. However, at a certain point if the CPU is especially low performance the performance advantages of GPU diminish significantly.

Vicuna/LLMs are a completely different animal. RTX 3090 is essentially the minimum to have the required VRAM and performance for a reasonable experience. We quantize Vicuna down to 4-bit and that's the only thing that makes it work in even that amount of VRAM.

from willow-inference-server.

hamishcunningham commented on June 8, 2024

gr8, tnx!

from willow-inference-server.

Recommend Projects

CPU Only Optimization about willow-inference-server HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent