Code Monkey home page Code Monkey logo

insanely-fast-whisper's Introduction

Insanely Fast Whisper

Powered by ๐Ÿค— Transformers & Optimum

TL;DR - Transcribe 300 minutes (5 hours) of audio in less than 10 minutes - with OpenAI's Whisper Large v2. Blazingly fast transcription is now a reality!โšก๏ธ

Basically all you need to do is this:

import torch
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float16,
                device="cuda:0")

pipe.model = pipe.model.to_bettertransformer()

outputs = pipe("<FILE_NAME>",
               chunk_length_s=30,
               batch_size=24,
               return_timestamps=True)

outputs["text"]

Not convinced? Here are some benchmarks we ran on a free Google Colab T4 GPU! ๐Ÿ‘‡

Optimisation type Time to Transcribe (150 mins of Audio)
Transformers (fp32) ~31 (31 min 1 sec)
Transformers (fp32 + batching [8]) ~13 (13 min 19 sec)
Transformers (fp16 + batching [16]) ~6 (6 min 13 sec)
Transformers (fp16 + batching [16] + bettertransformer) ~5.42 (5 min 42 sec)
Transformers (fp16 + batching [24] + bettertransformer) ~5 (5 min 2 sec)
Faster Whisper (fp16 + beam_size [1]) ~9.23 (9 min 23 sec)
Faster Whisper (8-bit + beam_size [1]) ~8 (8 min 15 sec)

Let's go!!

Here-in, we'll dive into optimisations that can make Whisper faster for fun and profit! Our goal is to be able to transcribe a 2-3 hour long audio in the fastest amount of time possible. We'll start with the most basic usage and work our way up to make it fast!

The only fitting test audio to use for our benchmark would be Lex interviewing Sam Altman. We'll use the audio file corresponding to his podcast. I uploaded it on a wee dataset on the hub here.

Installation

pip install -q --upgrade torch torchvision torchaudio
pip install -q git+https://github.com/huggingface/transformers
pip install -q accelerate optimum
pip install -q ipython-autotime

Let's download the audio file corresponding to the podcast.

wget https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/sam_altman_lex_podcast_367.flac

Base Case

import torch
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                device="cuda:0")
outputs = pipe("sam_altman_lex_podcast_367.flac", 
               chunk_length_s=30,
               return_timestamps=True)

outputs["text"][:200]

Sample output:

We have been a misunderstood and badly mocked org for a long time. When we started, we announced the org at the end of 2015 and said we were going to work on AGI, people thought we were batshit insan

Time to transcribe the entire podcast: 31min 1s

Batching

outputs = pipe("sam_altman_lex_podcast_367.flac", 
               chunk_length_s=30,
               batch_size=8,
               return_timestamps=True)

outputs["text"][:200]

Time to transcribe the entire podcast: 13min 19s

Half-Precision

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float16,
                device="cuda:0")                
outputs = pipe("sam_altman_lex_podcast_367.flac",
               chunk_length_s=30,
               batch_size=16,
               return_timestamps=True)

outputs["text"][:200]

Time to transcribe the entire podcast: 6min 13s

BetterTransformer w/ Optimum

pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float16,
                device="cuda:0")

pipe.model = pipe.model.to_bettertransformer()
outputs = pipe("sam_altman_lex_podcast_367.flac",
               chunk_length_s=30,
               batch_size=24,
               return_timestamps=True)

outputs["text"][:200]

Time to transcribe the entire podcast: 5min 2s

Roadmap

  • Add benchmarks for Whisper.cpp
  • Add benchmarks for 4-bit inference
  • Add a light CLI script
  • Deployment script with Inference API

Community showcase

@ochen1 created a brilliant MVP for a CLI here: https://github.com/ochen1/insanely-fast-whisper-cli (Try it out now!)

insanely-fast-whisper's People

Contributors

vaibhavs10 avatar gsheni avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.