explaingit

simbastack-hq/framedex

138PythonAudience · developerComplexity · 4/5ActiveSetup · hard

TLDR

CLI tool delivered as a Claude Code skill that indexes a video archive into searchable markdown sidecar files with transcripts, GPS, faces, and scene descriptions.

Mindmap

mindmap
  root((framedex))
    Inputs
      Video clips
      Hugging Face token
      Video context file
    Outputs
      Markdown sidecars
      Transcripts
      Face embeddings
      Keep cull ratings
    Use Cases
      Index home video archive
      Triage raw footage
      Search by transcript
    Tech Stack
      Python
      WhisperX
      ffmpeg
      insightface
      Claude Code

Things people build with this

USE CASE 1

Index a multi-drive archive of raw video clips so each one is searchable by transcript, place, and scene description.

USE CASE 2

Triage hours of unsorted footage with automatic keep, review, or cull ratings before editing.

USE CASE 3

Add speaker-labeled transcripts and translations to family videos in mixed languages.

USE CASE 4

Detect and group faces across a large clip library without uploading anything to the cloud.

Tech stack

PythonWhisperXffmpeginsightfaceClaude CodeAnthropic SDK

Getting it running

Difficulty · hard Time to first run · 1h+

Needs a Hugging Face token, accepted pyannote model terms, Whisper and face-detection model downloads, plus ffmpeg and exiftool installed locally.

In plain English

Framedex is a command-line tool for taking a messy archive of video clips spread across several external drives and turning it into something searchable. For each clip, it writes a small text file in markdown next to the original video. That sidecar file contains everything the tool was able to learn about the clip, including duration and resolution from the file itself, GPS coordinates, the place name those coordinates correspond to, a transcript with speaker labels, an English translation when the speech is in another language, detected faces, and a written description of the scene with a keep, review, or cull rating. The original videos are never changed. The tool is delivered as a Claude Code skill that installs a vidx command. After cloning the repo into the skills folder, running setup.py installs the Python dependencies and downloads the Whisper speech-recognition models plus face-detection models. You also need a Hugging Face token and to accept the terms on two pyannote model pages so that speaker diarization can run. The per-clip pipeline is a chain of well-known tools. ffprobe reads file metadata, exiftool reads GPS data, Nominatim turns the coordinates into a place name with polite rate limiting, ffmpeg extracts five evenly spaced JPEG frames and a mono 16-kHz WAV file, WhisperX runs transcription with word-level alignment and speaker labels, and insightface detects faces and computes 512-dimensional embeddings. Finally a vision model produces a structured scene description and a keep, review, or cull rating, and the sidecar file is written. Vision work can run in three modes. The cli backend uses a Claude Max subscription through the claude -p command, which has no marginal cost. The api backend uses the Anthropic SDK with an API key, which is the fastest option for huge archives. The local backend talks to LM Studio or any OpenAI-compatible local server so that nothing leaves the machine. The tool is built to be resumable: a sidecar that already exists means the clip is skipped on the next run. Useful flags include --dry-run, --max-files to test on a small subset, --force to re-index, --no-diarize to skip speaker labels, --no-faces to skip face detection, and --max-duration to cap clip length. A .video-context.md file at the root of a scan target gives the vision model a hint about the project and feeds proper nouns to Whisper for better transcription.

Copy-paste prompts

Prompt 1
Install framedex as a Claude Code skill, then run vidx --dry-run --max-files 10 on /Volumes/Drive1/Clips and show me what the first sidecar looks like.
Prompt 2
Configure framedex to use the local LM Studio backend so no video data or transcript leaves my machine, then index a 500-clip folder.
Prompt 3
Write a .video-context.md for a wedding video project that lists the proper nouns and venue names framedex should feed to Whisper.
Prompt 4
Resume a framedex run that was interrupted halfway through a 2 TB external drive, and skip diarization to speed it up.
Prompt 5
Parse the markdown sidecar files framedex produced and build a SQLite index I can query by GPS place name or speaker.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.