Code Monkey home page Code Monkey logo

whisper-webui's Introduction

Whisper-WebUI

A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

Whisper WebUI

Notebook

If you wish to try this on Colab, you can do it in here!

Feature

  • Generate subtitles from various sources, including :
    • Files
    • Youtube
    • Microphone
  • Currently supported subtitle formats :
    • SRT
    • WebVTT
    • txt ( only text file without timeline )
  • Speech to Text Translation
    • From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
  • Text to Text Translation
    • Translate subtitle files using Facebook NLLB models
    • Translate subtitle files using DeepL API

Installation and Running

Prerequisite

To run this WebUI, you need to have git, python version 3.8 ~ 3.10, CUDA version above 12.0 and FFmpeg.

Please follow the links below to install the necessary software:

After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!

Automatic Installation

If you have satisfied the prerequisites listed above, you are now ready to start Whisper-WebUI.

  1. Run Install.bat from Windows Explorer as a regular, non-administrator user. ( Run install.sh if you are using Mac )
  2. After installation, run the start-webui.bat. ( Run start-webui.sh if you are using Mac )
  3. Open your web browser and go to http://localhost:7860

( If you're running another Web-UI, it will be hosted on a different port , such as localhost:7861, localhost:7862, and so on )

And you can also run the project with command line arguments if you like by running user-start-webui.bat, see wiki for a guide to arguments.

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
openai/whisper fp16 5 4m30s 11325MB 9439MB
faster-whisper fp16 5 54s 4755MB 3244MB

If you want to use the original Open AI whisper implementation instead of optimized whisper, you can set the command line argument DISABLE_FASTER_WHISPER to True. See the wiki for more information.

Available models

This is Whisper's original VRAM usage table for models.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

.en models are for English only, and the cool thing is that you can use the Translate to English option from the "large" models!

whisper-webui's People

Contributors

jhj0517 avatar jsboige avatar aierlma avatar damho1104 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.