Code Monkey home page Code Monkey logo

Comments (9)

nirnachmani avatar nirnachmani commented on September 2, 2024

Reading through the readme, I get confusing messages about running WIS without GPU:

  • Primarily targeting CUDA with support for low-end (cheap) devices such as the Tesla P4, GTX 1060, and up. Don't worry - it screams on an RTX 4090 too! (See benchmarks). Can also run CPU-only.

  • We'll make our best effort to support CPU wherever possible for current and future functionality

  • We are very interested in working with the community to optimize WIS for CPU. We haven't focused on it because we consider medium beam 1 to be the minimum configuration for intended tasks and CPUs cannot currently meet our latency targets.

So, is it possible to run without a GPU? Is the error I am getting (in first post) related to not having a GPU? Or is it something else? How do I go about setting WIS up without GPU?

from willow-inference-server.

kristiankielhofner avatar kristiankielhofner commented on September 2, 2024

Sorry for the slow response - I somehow missed your first message!

We will soon be releasing WIS 1.0 with drastically improved support for CPU-only configurations. I suggest trying the pre-release:

git checkout wisng```

Follow the guide in the README and WIS should start without issue on your configuration.

from willow-inference-server.

nirnachmani avatar nirnachmani commented on September 2, 2024

Thanks.

I run git checkout wisng and then followed the rest of the instructions. However, when I run ./utils.sh install I get a similar error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.

Then, if I try to run ./utils.sh run I get:

Using configuration overrides from .env file
Models not found. You need to run ./utils.sh download-models - exiting

And if I try to run ./utils.sh download-models I get the same message:

Using configuration overrides from .env file
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.

Is this related to my previous attempt to run the older version? I first deleted the old directory and re-cloned before checking out wisng.

from willow-inference-server.

nirnachmani avatar nirnachmani commented on September 2, 2024

It's working now - under Detect GPU support in utils.sh I forced

DOCKER_GPUS=""
DOCKER_COMPOSE_FILE="docker-compose-cpu.yml"

from willow-inference-server.

nirnachmani avatar nirnachmani commented on September 2, 2024

Well, it is working. I am probably not telling you something you don't know, but it is significantly slower than with the Tovera hosted best-effort WIS that you are providing. My WIS is running on an Intel NUC with an i7-6770hq and it takes a good 14 seconds to respond to a simple commands (turn off a light), compared to 2 seconds with the hosted WIS...

Issue is, I can't put a GPU in the NUC, and a eGPU case makes the whole thing much more expensive.

Anyway, great work and thank you very much for working on the project.

from willow-inference-server.

kristiankielhofner avatar kristiankielhofner commented on September 2, 2024

A couple of things:

  1. If you could test it the wisng branch has a dramatically improved version of WIS with improved CPU support (auto-detect, no hacks necessary). This is the version for the upcoming Willow/WIS/WAS 1.0 release.

  2. We talk about it a bit in the README: GPUs are so fundamentally (from a physical architecture standpoint) better at tasks like speech recognition. A $100 six year old Nvidia GPU will handily beat the fast CPUs on the market at a fraction of the cost and power. We use the same fundamental implementation as faster-whisper (ours is slightly faster in my testing) and there is another CPU optimized implementation called whisper.cpp. WIS, faster-whisper, and whisper.cpp (the fastest Whisper implementations in the world) on CPU cannot even remotely come close to a GPU. Alexa uses GPUs. Google home uses GPUs (possibly TPUs). Siri uses Apple Neural (with dedicated silicon support) on device and almost certainly GPU in the cloud.

You simply cannot provide the level of speech recognition quality and speed as these commercial devices running on CPU. I understand with a NUC a GPU is infeasible but "it is what it is".

That said, you at least have a decent CPU (there are approaches trying to do voice assistants on a Raspberry Pi, which is ridiculous and a non-starter in my opinion). With wisng you can try different models that trade accuracy for speed.

In your Willow Inference Server URL configuration you can append the model parameter:

http://your-willow-host:your-port?model=your-model-to-try

Where model can be (in order from highest quality/slowest to lowest quality/fastest): large, medium (our default), small, base, or tiny (the default for the Rhasspy/Home Assistant implementation). As you go down the models the speed improves dramatically but the quality drops dramatically too.

We'd definitely be interested to hear your feedback: benchmarking for CPUs is extremely hard because compared to GPU (a Tesla P4 is a Tesla P4, a GTX 1070 is more-or-less a GTX 1070) there are so many CPU variants, memory configurations, etc it's very difficult to predict performance on CPU.

from willow-inference-server.

kristiankielhofner avatar kristiankielhofner commented on September 2, 2024

Update: I added some CPU benchmarks to the benchmarks table in wisng. As you can see the fastest CPU I have available (AMD ThreadRipper Pro 5955WX) is 5x slower than a GTX 1070 with our default settings (model medium, beam 1). It's not until you get to the base model on this CPU where you can meet our latency goal of sub-1 second (local) processing times. You're likely getting roughly two seconds on our hosted implementation because it has other load and there is internet latency involved.

To give you an idea of absurd performance with local absurd GPU: in my home environment (with an RTX 3090) I see less than 300ms (current record is 212ms) between end of speech and Home Assistant completing the action. Even a GTX 1070 can do less than 500ms.

from willow-inference-server.

nirnachmani avatar nirnachmani commented on September 2, 2024

Thank you for the information. I did read about the poor performance with CPU so I had low exceptions, however, I didn't expect 14 seconds. By the way, this was with the wisng branch, I believe - I run git checkout wisng before going through the setup. I'll try using different models as you suggested to see what difference it makes. Maybe eventually I'll invest in eGPU case and a GTX 1070.

from willow-inference-server.

kristiankielhofner avatar kristiankielhofner commented on September 2, 2024

From what I can tell you must have an older version - did you git pull as well?

Yes, for highly parallel tasks like speech recognition CPU performance is fundamentally terrible by comparison. The GTX 1070 has 1920 cores and 256 GB/s memory bandwidth. The RTX 3090 is 10496 cores and 936 GB/s.

It's not even close - this is why we emphasize "just use a GPU" so heavily.

from willow-inference-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.