explaingit

kouhxp/yapsnap

Analysis updated 2026-06-24

167PythonAudience · developerComplexity · 2/5LicenseSetup · easy

TLDR

Command line tool that transcribes any video URL or local audio file to text offline on CPU, using yt-dlp, ffmpeg, and a small streaming Zipformer2 ONNX model.

Mindmap

mindmap
  root((yapsnap))
    Inputs
      Video URL
      Local audio file
      Local video file
    Outputs
      Plain text transcript
      Optional timestamped lines
    Use Cases
      Transcribe YouTube videos
      Note taking from podcasts
      Multilingual transcription
    Tech Stack
      Python
      yt-dlp
      ffmpeg
      sherpa-onnx
      ONNX
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Generate plain text transcripts of YouTube, TikTok, or Instagram Reels videos from the command line

USE CASE 2

Batch transcribe local podcast or meeting recordings offline without sending audio to a cloud service

USE CASE 3

Produce timestamped subtitle drafts for editing in a video tool

What is it built with?

Pythonffmpegyt-dlpsherpa-onnxONNX

How does it compare?

kouhxp/yapsnaphelpmeeadice/bandori-pet-revhkust-c4g/domainshuttle
Stars167156156
LanguagePythonPythonPython
Setup difficultyeasymoderatehard
Complexity2/53/54/5
Audiencedevelopergeneralresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Needs system ffmpeg installed separately and a one time 80 MB model download on first run.

Apache 2.0 permits commercial and personal use with attribution and a patent grant.

In plain English

yapsnap is a command-line tool that turns any video URL or local audio file into a plain text transcript. It runs entirely on your CPU with no GPU and no cloud calls. After the first run, when an 80 MB model is downloaded, everything works offline and your audio never leaves your machine. The basic usage is one line, for example yapsnap followed by a YouTube URL, which writes a .txt file with the transcription. Under the hood it chains three pieces. yt-dlp fetches audio from any URL it understands, which covers YouTube, YouTube Shorts, X (formerly Twitter), TikTok, Instagram Reels, and direct media links. ffmpeg decodes the audio to 16 kHz mono PCM and optionally speeds it up without changing pitch using an atempo filter. The default speed factor is 1.5x, which the author says cuts about a third off transcription time with little accuracy loss. Then a streaming Zipformer2 transducer from the Kroko ASR project, in INT8 ONNX format, processes the PCM in chunks via sherpa-onnx and produces text. Local files in common formats also work, including mp3, mp4, m4a, wav, webm, mov, mkv, aac, opus, ogg, and flac, since anything ffmpeg can decode is acceptable input. By default the output goes to a transcripts/ folder under the current directory, with a filename derived from the input or video ID. Passing -o sets a custom output path. Passing --timestamps switches the output from one paragraph to one sentence per line with [MM:SS] prefixes, and the timestamps stay in original-audio time even when the audio was sped up before transcription. Installation is pip install yapsnap from PyPI, plus a system ffmpeg via brew, apt, dnf, winget, or choco depending on the operating system. Two equivalent commands are installed: yapsnap and an alias called transcribe. The whole tool is a single Python module with three dependencies (sherpa-onnx, numpy, yt-dlp). Python 3.9 or newer is required and the license is Apache 2.0. The default model is English, but the same code can transcribe other languages by pointing --model at a different folder or setting the KROKO_MODEL environment variable. Kroko publishes streaming models for Dutch, French, German, Hebrew, Italian, Portuguese, Spanish, Swedish, Swiss German, and Turkish on Hugging Face, and any other sherpa-onnx streaming transducer with the standard encoder, decoder, joiner, and tokens.txt layout also works.

Copy-paste prompts

Prompt 1
Write a shell loop that runs yapsnap over every mp3 in a folder and writes timestamped transcripts to transcripts/
Prompt 2
Show me how to swap the default English Kroko model for the French one using the KROKO_MODEL environment variable
Prompt 3
Build a small wrapper script that calls yapsnap on a YouTube playlist URL and concatenates the transcripts into one markdown file
Prompt 4
Explain how to tune the ffmpeg atempo speed factor in yapsnap without losing accuracy
Prompt 5
Add a flag to yapsnap that also outputs an SRT subtitle file alongside the txt transcript

Frequently asked questions

What is yapsnap?

Command line tool that transcribes any video URL or local audio file to text offline on CPU, using yt-dlp, ffmpeg, and a small streaming Zipformer2 ONNX model.

What language is yapsnap written in?

Mainly Python. The stack also includes Python, ffmpeg, yt-dlp.

What license does yapsnap use?

Apache 2.0 permits commercial and personal use with attribution and a patent grant.

How hard is yapsnap to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is yapsnap for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.