Code Monkey home page Code Monkey logo

Comments (5)

nikito avatar nikito commented on July 28, 2024

In terms of compute it would probably be fine, just note the 6GB VRAM could get a little tight if you want to use the large model for instance, as well as more advanced TTS models that we will eventually integrate (Coqui, XTTS2). For reference, I am sitting at 4.788GB running large-v2 and XTTS2. Note this is also on an Ada Lovelace GPU which supports different quantization than the Turing architecture of the 1660, so it may take a little more ram in that case (on a GTX1070 I saw my ram usage spike as high as 6.5GB with these same models, but that is Pascal architecture).

from willow-inference-server.

guarddog13 avatar guarddog13 commented on July 28, 2024

In terms of compute it would probably be fine, just note the 6GB VRAM could get a little tight if you want to use the large model for instance, as well as more advanced TTS models that we will eventually integrate (Coqui, XTTS2). For reference, I am sitting at 4.788GB running large-v2 and XTTS2. Note this is also on an Ada Lovelace GPU which supports different quantization than the Turing architecture of the 1660, so it may take a little more ram in that case (on a GTX1070 I saw my ram usage spike as high as 6.5GB with these same models, but that is Pascal architecture).

Hmmmm. I'm trying to upgrade my circa 2018 SFF and staying in a ~$500 budget.

Using remote WIS I have near instant results with HA... faster than even what my local HA assist was doing. I don't want to lose this speed by going local. Maybe get the linked one and find a used 1070 to add to it? The sff could never handle a powered GPU. I'd eventually like to run a LLM next to it so I can cut my reliance on the cloud and more importantly Google. I'm not worried about the speed of the LLM as I've found smaller models that have a 3-5 second speed on the SFF... I'm not terribly interested in speed with an AI. I need the speed with WIS, however.

Any ideas or know of anywhere selling a refurbished desktop with a 1070 or better?

from willow-inference-server.

nikito avatar nikito commented on July 28, 2024

The system linked may do fine, like I said in terms of compute it would outperform a 1070, it just comes down to VRAM. Given your goals I am not sure getting a 1070 on top of this system would make much sense, you'd be better off getting another 1660 or something like that down the line I think. If you plan to run local LLMs on top of WIS you'd definitely want more VRAM as most 7B models will use something like 5-7GB even with quantization, never mind the VRAM used by context and such. GPU speed and memory speed just improve your tokens/second, so if you don't care about the speed then the real focus is just getting more VRAM.

I'm not too familiar with sites that sell refurbished desktops, I tend to build my own systems 😁 But I have seen users get a used optiplex system and then put a 1070 in it and come well below $500 in terms of cost.

from willow-inference-server.

guarddog13 avatar guarddog13 commented on July 28, 2024

I actually had a LLM running on the SFF with the K620. Sssssslowly but it worked. It was a model i found a HA community user using on text-based-ui. They were able to get it to work on a pi4 but very slow. It's not the best model but works for my purposes.

Have you tested WIS on the pi5? I wonder how it would?

While i have you here is there anyway to use multinet with WAC? Or do you plan to integrate WAC into WAS at some point?

from willow-inference-server.

kristiankielhofner avatar kristiankielhofner commented on July 28, 2024

Have you tested WIS on the pi5? I wonder how it would?

WIS and faster-whisper use the same underlying engine (ctranslate2). WIS is slightly faster because of some other optimization work we've done + optimized for latency and short speech segments. I'm not aware of any projects that do Whisper with any kind of special acceleration on ARM platforms other than NEON which are just vector instructions for ARM (ctranslate2 already does this). Ctranslate2 has other acceleration frameworks for x86_64 CPU but it doesn't even support AMD or Intel GPUs and they've gone on record saying they haven't even considered it...

If you look at the comparison benchmarks and look around online it seems the Pi 5 is roughly twice as fast as the Pi 4 with Whisper.

So for our standard 3.8 test speech segment and minimum recommended model (small) a 3.8 second speech segment still takes roughly 25 seconds on the Pi 5. A GTX 1070 does medium in 424ms and medium is significantly "slower" than small. It's just not even close and never will be so we have no plans to support these ARM based platforms with WIS.

We're removing all support for multinet in upcoming versions. We've found it to be impractical for typical speech commands (proper nouns of entity names, etc).

WAC is already integrated in WAS in a development branch and it will be in an upcoming release candidate.

from willow-inference-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.