Code Monkey home page Code Monkey logo

whisperer's People

Contributors

mgfx avatar tigros avatar vaiz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisperer's Issues

Would you be against implementing MKV Language flags?

Many MKVs have audio tracks with language flags set so when the program takes in the file could it not also set the language from this flag with a possibility to manually override this per video in batch if not set correctly or comes UND flagged?.. Also a secondary is take all audio tracks of an MKV and add them to the batch if that is possible.

This program already does wonders for me when I add stuff to its watch folder before I sort them in my jellyfin server.

Can't find model.

Why can't it find the model? Just like this.
image
I'm using a Windows10 PC with AMD Ryzen 9 4900H and NVIDIA GeForce RTX 2060, and Whisper Desktop works fine using the same model.

I'm not sure I do understand what you said in the README about ffmpeg. Is this issue related to ffmpeg? But adding ffmpeg to the system environment variables doesn't work either.

I really hope you can help me, because I urgently need to batch process my videos. Thanks!

How to add space between subtitles?

Hello. There is no space between two sentences when using whisper.ccp models. In other words, when the speaker finishes the sentence, the subtitle is still shown. I just want it to be displayed only when speaker is speaking. But subtitles always appear. What setting should I change?.

Nothing after 9 hours using ggml-large-v3

Left it running overnight with a large queue of files to work through and woke the next morning to find that none had been completed. The timer was still climbing, but the cancel button had turned back to the go button.

I reduced the concurrency from five to one and hit go again, but nothing happened after a while.

Brought up the task manager and it was definitely using as much of the GPU as it could. When I closed the app, those processes using the gpu didn't close and I had to shut them each down manually.

Running the most recent version from the end of December.

Looks like it does not support large-v3

There is an issue with whisperer while using large-v3. When i use v3, it never finishes. It takes forever. But when i use v2, there is no problem.

Also i want to thank you for making this project cuz currently its impossible to find a modification that works with AMD GPU's so this is a life saver.

New version of Whisperer?

Hi Tigros,

Is there a new version of Whisperer which works with the updated ggml-large.bin? The version I have been using does not.

Thanks,

Rick

"Unsupported Windows version, will now exit."

First. Thank you very much for your efforts in this project. I admit I haven't yet taken full advantage of what it has to offer but I love it! :-)

Unfortunately it has stopped working for me in its 3.0 version (2023-12-xx). When I start it up I get the following error: "Unsupported Windows version, will now exit." And then the application terminates.

I am using Windows 10 x64 fully upgraded (2023-12-xx). I have a NVidia GTX780 Ti card with updated drivers.

If you need more information just ask! Thanks in advance!

DJuego

Is it possible to change/add option for output format?

Huge thanks for the effort of this amazing program!
Will it be possible add option for output format? Like the plain text version or other, it would be super helpful.
Or maybe there's a way to change output format somewhere?

win11 can't run

win11 Prompts can't run: "A critical error occurred ,check graphics card/drivers"
win11: WhisperDesktop can run

unsuccessful use

Hello, I have a question, I have not successfully used it on windows, it did not report an error, I first clicked "pick files" and then selected the model file, then dragged the audio file in, and then clicked "go" , and then only a wav audio file was generated for me, and then there was no translation. I don't see where is the result after recognition

audio file issue

Fantastic job, it's a very useful tool. 😊
In version 2.5, When I tried to extract subtitles from an audio file, it still attempted to extract audio from the file. This issue was not present in version 2.3.
I don't really understand programming, wouldn't this be more time-consuming?

2023-04-28_221457

Versions

I'm confused about these different versions. If I just want the latest large model, do I just go for the last one - ggml-large.bin?

image

Also, I hear there's a new Whisper out. What's the link to that? And how about Whisperer? Is there a new one?

Thanks

I tried V2.6 and there was no srt file except for the output wmv

I tried V2.6 and there was no srt file except for the output wmv

reg query hklm\SYSTEM\CurrentControlSet\Control\Class{4d36e968-e325-11ce-bfc1-08002be10318}

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4d36e968-e325-11ce-bfc1-08002be10318}
Class REG_SZ Display
ClassDesc REG_SZ @c_display.inf,%ClassDesc%;Display adapters
IconPath REG_MULTI_SZ %SystemRoot%\system32\setupapi.dll,-1
LowerLogoVersion REG_SZ 6.0

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4d36e968-e325-11ce-bfc1-08002be10318}\0000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4d36e968-e325-11ce-bfc1-08002be10318}\Configuration
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4d36e968-e325-11ce-bfc1-08002be10318}\Properties

Whisperer 3.0 - "Skip if output exists" maybe not working

Hi Tigros,

I have a folder in which I had completed transcribing about 1,292 of about 1,800 mp3 files using Whisperer 2.9. This morning when I saw v3.0, I cancelled the 2.9 run and started processing the folder with v3.0. But it didn't recognize that 1,292 had been done. It started from scratch. I couldn't tell whether it was overwriting existing vtt files or what. So, I cancelled that job and resumed it using 2.9, and it recognized that 1,292 had been completed and started processing the rest.

Maybe this feature would have worked if all those files had originally been processed with v3.0. I'll experiment with that once this batch has been completed using 2.9.

Thanks

"Virus detected" (2023-12-27)

Sorry, but I feel I must inform you.

In the last few days there have been several releases of Whisperer version 3.0. In the last one (2023-12-27) Windows Defender declares the download to be contaminated by a Trojan (Trojan:Win32/Wacatac.B!ml). According to Window Defender, it is a "severe" threat.... :-O

DJuego

Title: How to Launch the Graphical User Interface of Whisperer Project?

Content:
Hello, I have recently been trying to utilize the Whisperer project but encountered some difficulties. I have configured all the necessary dependencies as per the instructions in README.md and successfully cloned the project to my local machine. However, I am unclear on how to launch the Graphical User Interface (GUI). I attempted to double-click main.exe, but only saw a command prompt flash and then disappear. I also tried opening whisperer.sln through Visual Studio, but did not see the expected graphical interface. Are there specific steps or commands to launch the GUI? I greatly appreciate your assistance!

Sometimes the app hangs and does nothing

Sometimes when I have a long queue it just hangs on the last file for hours without ending the processing.

I had an example today where it was stuck on the last file for hours. It was a 22MB MP4 file so it should only take max 10-15 minutes.

The app have not crashed I can still cancel the processing.

I know this is not much to go on so how can I generate a proper error report for you next time it happens?

Not skipping if output exists

In earlier versions of Whisperer, the "Skip if output exists" feature worked. I could interrupt a batch of 1,000 audios halfway through, and when I resumed it, within seconds it showed that 500 had been completed, and it resumed where it left off. With 3.0 and 3.1, this feature is not working for me. If I interrupt a batch and then resume, it re-transcribes them all.

Initial Prompt

Might it be possible to allow the use of an initial prompt?

Right now Whisperer is my go-to due to its use of the GPU and ability to batch tasks. Thanks for putting it together.

From noScribe:
"Prompts: The whisper AI can be initialized with a short text-sequence called prompt (see here for more info). This will influence the style of the following transcription. I tried to force the AI to include filler words like "uhm" in the transcription by giving it a prompt containing them (like "Umm, let me think like, hmm..."). But this only worked on some occasions (whisper tends to 'forget' the prompt quite quickly). Prompts are language specific and will only be applied if you select a particular language (not 'auto'). You can change or add prompts for other languages in the file "prompt.yml" in the home directory of the app."

Translation question

I'm using Whisperer to transcribe hundreds of files that are mostly in English, but in which the speaker will often throw in a little Hindi or Sanskrit. I presume that I should leave "Language" on English. Should I check "Translate to English"? Will it confuse Whisperer if the language is English but it's told to translate to English?

Incidentally, I got a new Dell with a NVIDIA GeForce RTX 4090 24GB GDDR6X in it. My previous computer with a 12GB card strained to run two Whisperers at a time, with fans whirring loudly. Now I'm running 5 at a time and you can barely hear any fan noise.

Meaningless word repetition problem

When a video file over 10 minutes is loaded and converted to srt, txt, etc. using voice recognition, tens of thousands of lines of meaningless words are written.
For example, [end] [baby crying] [everyone] [fire sound], etc.
Words that are not in the video file are written endlessly.
System specifications are as follows:
AMD 5900x
RTX3060 12GB
G.SKILL DDR4 64GB xmp 2.0
870evo 2TB SSD
Too many meaningless words written. Can you solve this problem?

Can add initial_prompt to add punctuation?

thanks for ur great job sir,
recently I transcribe some long text while putting out without any puctuation,
can it add a initial_prompt option that i can add some prompt to force it adding punctuation all the time?
or if is there another way to force it adding punctuation all the time.
thx again

Good setting for "Max at Once"

I have a Radeon RX 6600 card. Not the fastest but a budget model.

I wonder what is a good setting for "Max at Once" with a medium model? The card got 8 GB VRam.

I have previously set it to 10 but I think that might be a bit optimistic

Max at once not working

Hi,

I tried running 2 copies of the same input file with Whisperer. Even though I use 'Max at once'=2, the program still transcribes the files sequentially and finish with the same time as 'Max at once'=1.
My GPU is RX 6600M with 8GB of RAM so it should be enough for a small Whisper model.

Why?

[suggestion] Can you add a function to show the progress bar and debug console?

After the batch transcription launch, I can only use the system monitor to 'guess' the process is working.

And because I may sometime transcribe many files at once, I have to make sure whether the setting is right.
If I can read the transcription at real time, through the debug console. It would be great for me, just glance for few seconds, and leave the computer alone. (As you know, during the progress, the ventilation fans run loudly and hot air comes out.

AMD Performance

I wonder if anyone have the performance numbers for AMD gpus. I am thinking of updating from a GTX 1660 Super to a RTX 6600 card and I wonder if I can win any performance in this task?

Whisper Desktop only provides benchmarks for nVidia card and he says he is unsure of discrete AMD cards.

Model large-v2 not working?

Hello !
Thank you for creating this great tool.
I have a problem with it, I want to use the large-v2 model but it seems the software doesn't work with this model
I use the large model and everything works normally.
Hope you check it out soon and maybe update it again.
Thank you for your contributions!

Whisperer not working

I'm on a PC with a PowerColor Fighter AMD Radeon RX 6700 XT Gaming Graphics Card with 12GB GDDR6 Memory. 64-bit operating system, x64-based processor. Windows 11 Home.

I can't get Whisperer to work. Whisper Desktop works just fine. Using the large language model, it can transcribe a 2 hour audio in about 20 minutes. But I'd really like to be able to achieve batch processing.

I don't get an error message. About 6 hours ago I started a batch of 17 10-minute mp3s. Whisperer immediately made wav files out of them. The Go button changed to Cancel, But then nothing more has happened.

I don’t hear my GPU’s fan whirring, which usually happens with Whisper. Any suggestions?

Thank you so much for this amazing tool!!!

Again, Thanks a LOT for this life savior amazing helpful tool.
Question: Any plans to add speaker diarization or identification to this tool? That would be wonderful to have!

Immediate crash using Intel's Arc card - A380

Hello.

I was looking for a batch processing mode of Whisper https://github.com/Const-me/Whisper and I think this is the app here. proposed by the developer.

Unfortunately, it crashes immediately with the attached pop-up message using my Intel Arc A380 6GB VRAM card (latest drivers - Latest Win 11 Pro all updates):
Intel Arc crash

  1. Is it possible to fix it ?

  2. I'm looking forward to batch processing of one by one file, not as many as possible at the same time.

Does this app support this mode ?
Is it possible to add it ?

Thank you.

Remove file extension form the output name

I'm having a problem with: fileName.wav.text being outputted instead of just: fileName.txt.

Could you add an option to just output fileName.txt and an option to only output txt without a timestamp?

Thank you.

Watch Folders feature

Hi, thanks for this work, your app works well so far and is very appreciated! Is it possible that you Could you add the option of a watch folders? so that, certain folders can be remembered for automatic transcription to take place in the background IF new untranscribed files are found in the folder? With this I think your app would be almost perfect. Much Thanks already!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.