Comments (5)
In terms of compute it would probably be fine, just note the 6GB VRAM could get a little tight if you want to use the large model for instance, as well as more advanced TTS models that we will eventually integrate (Coqui, XTTS2). For reference, I am sitting at 4.788GB running large-v2 and XTTS2. Note this is also on an Ada Lovelace GPU which supports different quantization than the Turing architecture of the 1660, so it may take a little more ram in that case (on a GTX1070 I saw my ram usage spike as high as 6.5GB with these same models, but that is Pascal architecture).
from willow-inference-server.
In terms of compute it would probably be fine, just note the 6GB VRAM could get a little tight if you want to use the large model for instance, as well as more advanced TTS models that we will eventually integrate (Coqui, XTTS2). For reference, I am sitting at 4.788GB running large-v2 and XTTS2. Note this is also on an Ada Lovelace GPU which supports different quantization than the Turing architecture of the 1660, so it may take a little more ram in that case (on a GTX1070 I saw my ram usage spike as high as 6.5GB with these same models, but that is Pascal architecture).
Hmmmm. I'm trying to upgrade my circa 2018 SFF and staying in a ~$500 budget.
Using remote WIS I have near instant results with HA... faster than even what my local HA assist was doing. I don't want to lose this speed by going local. Maybe get the linked one and find a used 1070 to add to it? The sff could never handle a powered GPU. I'd eventually like to run a LLM next to it so I can cut my reliance on the cloud and more importantly Google. I'm not worried about the speed of the LLM as I've found smaller models that have a 3-5 second speed on the SFF... I'm not terribly interested in speed with an AI. I need the speed with WIS, however.
Any ideas or know of anywhere selling a refurbished desktop with a 1070 or better?
from willow-inference-server.
The system linked may do fine, like I said in terms of compute it would outperform a 1070, it just comes down to VRAM. Given your goals I am not sure getting a 1070 on top of this system would make much sense, you'd be better off getting another 1660 or something like that down the line I think. If you plan to run local LLMs on top of WIS you'd definitely want more VRAM as most 7B models will use something like 5-7GB even with quantization, never mind the VRAM used by context and such. GPU speed and memory speed just improve your tokens/second, so if you don't care about the speed then the real focus is just getting more VRAM.
I'm not too familiar with sites that sell refurbished desktops, I tend to build my own systems 😁 But I have seen users get a used optiplex system and then put a 1070 in it and come well below $500 in terms of cost.
from willow-inference-server.
I actually had a LLM running on the SFF with the K620. Sssssslowly but it worked. It was a model i found a HA community user using on text-based-ui. They were able to get it to work on a pi4 but very slow. It's not the best model but works for my purposes.
Have you tested WIS on the pi5? I wonder how it would?
While i have you here is there anyway to use multinet with WAC? Or do you plan to integrate WAC into WAS at some point?
from willow-inference-server.
Have you tested WIS on the pi5? I wonder how it would?
WIS and faster-whisper use the same underlying engine (ctranslate2). WIS is slightly faster because of some other optimization work we've done + optimized for latency and short speech segments. I'm not aware of any projects that do Whisper with any kind of special acceleration on ARM platforms other than NEON which are just vector instructions for ARM (ctranslate2 already does this). Ctranslate2 has other acceleration frameworks for x86_64 CPU but it doesn't even support AMD or Intel GPUs and they've gone on record saying they haven't even considered it...
If you look at the comparison benchmarks and look around online it seems the Pi 5 is roughly twice as fast as the Pi 4 with Whisper.
So for our standard 3.8 test speech segment and minimum recommended model (small) a 3.8 second speech segment still takes roughly 25 seconds on the Pi 5. A GTX 1070 does medium in 424ms and medium is significantly "slower" than small. It's just not even close and never will be so we have no plans to support these ARM based platforms with WIS.
We're removing all support for multinet in upcoming versions. We've found it to be impractical for typical speech commands (proper nouns of entity names, etc).
WAC is already integrated in WAS in a development branch and it will be in an upcoming release candidate.
from willow-inference-server.
Related Issues (20)
- wis server unresponsive. HOT 12
- Unable to start server (CUDNN_STATUS_NOT_INITIALIZED) HOT 6
- LLM installation issue HOT 3
- Merge default settings with custom settings
- WIS Server CPU Mode - Model returns "you." only HOT 7
- WIS on WSL with GPU - Windows Firewall handling HOT 1
- Updated Willow (though WAS) and also WIS on server. HOT 10
- T2S - number conversion does not work HOT 1
- T2S - uppercase shortcuts are not spelled out HOT 1
- T2S - text longer than 600 characters causes "RuntimeError: The size of tensor a () must match the size of tensor b (600) at non-singleton dimension 1" HOT 1
- reinstall issues with CUDA HOT 1
- utils.sh install stage-0 1/8 - fails with "no space left on device" HOT 4
- Adding Distil-Whisper HOT 2
- Is native Windows support possible? HOT 1
- Random question... about installation issue.
- CUDA error starting up WIS HOT 10
- { "response": "Chatbot not installed or supported" } HOT 2
- Running out of Memory on Install HOT 6
- support_sv: bool = True causes WIS to crash on missing library HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from willow-inference-server.