explaingit

vaibhavs10/insanely-fast-whisper

12,887Jupyter NotebookAudience · developerComplexity · 3/5Setup · moderate

TLDR

A command-line tool that transcribes audio files to text at extreme speed using OpenAI's Whisper model, processing 2.5 hours of audio in under 98 seconds on a high-end GPU.

Mindmap

mindmap
  root((Fast Whisper))
    What it does
      Audio to text
      Word timestamps
      Speaker labels
    Speed Features
      fp16 math
      Batch processing
      Flash Attention 2
    Hardware
      NVIDIA GPU
      Apple Silicon
    Output
      Transcript text
      JSON export
      Translation
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Transcribe long podcast or interview recordings to text in minutes on a GPU-equipped machine.

USE CASE 2

Generate word-level or chunk-level timestamps from audio for subtitles or searchable transcripts.

USE CASE 3

Identify different speakers in a conversation recording using the built-in diarization integration.

USE CASE 4

Integrate fast Whisper transcription into a Python data pipeline without calling an external API.

Tech stack

PythonHugging FacePyTorchCUDAFlash Attention 2

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a compatible NVIDIA GPU or Apple Silicon Mac, speaker diarization needs an additional setup step with a separate model.

In plain English

Insanely Fast Whisper is a command-line tool that converts audio files into text transcripts, using OpenAI's Whisper speech recognition model. The headline claim in the README is that it can transcribe two and a half hours of audio in under 98 seconds when run on a high-end GPU, which it achieves by combining several speed optimization techniques. Whisper is an AI model that listens to audio and writes down what was said. It is quite accurate and supports many languages. The standard way to run Whisper is slow, especially for long recordings. This tool wraps Whisper with several acceleration techniques from the Hugging Face ecosystem, including half-precision math (fp16), batch processing, and an optional feature called Flash Attention 2, which makes the attention calculations inside the model faster. The README includes benchmark numbers showing how each combination of optimizations affects speed. Using it is meant to be simple. You install it with a single command and then point it at an audio file. It runs entirely on your own machine, so your audio never leaves your computer. It works on NVIDIA graphics cards and on Macs with Apple Silicon chips. The tool can transcribe audio or translate it into English from another language. It can also produce timestamps at the word level or by chunks, which is useful if you want to know exactly when each part of the transcript occurred. For situations where you need to identify who is speaking in a conversation, the tool integrates with a separate speaker diarization model. Diarization means labeling each part of the transcript with which speaker said it. This requires an additional step to set up. The project started as a benchmark demonstration and grew into a practical tool based on community interest. It is not affiliated with OpenAI. The code can also be used as a Python snippet rather than through the CLI if you prefer to integrate it into your own scripts.

Copy-paste prompts

Prompt 1
Using insanely-fast-whisper, write a Python snippet to transcribe a 1-hour audio file and save word-level timestamps to a JSON file.
Prompt 2
How do I enable speaker diarization in insanely-fast-whisper to label which person is speaking in each segment of a recording?
Prompt 3
I want to transcribe a batch of audio files with insanely-fast-whisper on my Mac with Apple Silicon. What command should I run?
Prompt 4
Show me how to run insanely-fast-whisper with Flash Attention 2 enabled for maximum speed on an NVIDIA GPU.
Prompt 5
How do I use insanely-fast-whisper to translate a French audio file into an English transcript from the command line?
Open on GitHub → Explain another repo

← vaibhavs10 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.