explaingit

huangchihhungleo/claude-real-video

Analysis updated 2026-05-18

637PythonAudience · developerComplexity · 2/5LicenseSetup · moderate

TLDR

A local Python tool that extracts scene-change keyframes and audio transcripts from a video so you can paste the results into Claude, ChatGPT, or any AI to ask questions about what the video shows.

Mindmap

mindmap
  root((claude-real-video))
    What it does
      Scene-change frame extraction
      Duplicate frame removal
      Audio transcription via Whisper
      Manifest for AI context
    Inputs
      YouTube and video URLs
      Local video files
      Cookie files for gated content
    Outputs
      Keyframe images
      Text transcript
      Optional audio track
    Use with
      Claude
      ChatGPT
      Gemini
      Claude Code skill
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Extract keyframes from a product demo video and paste them into Claude to ask which features were demonstrated.

USE CASE 2

Transcribe a lecture video and save the text to a notes folder for later reference without sending the video to any cloud service.

USE CASE 3

Run the tool with `--why` to pull frames from a competitor's announcement video focused on finding their pricing strategy.

USE CASE 4

Install it as a Claude Code skill so Claude can automatically process any video URL you paste into your coding session.

What is it built with?

Pythonffmpegyt-dlpWhisper

How does it compare?

huangchihhungleo/claude-real-videobytedance/lancesapientinc/hrm-text
Stars637637617
LanguagePythonPythonPython
Setup difficultymoderatehardhard
Complexity2/55/55/5
Audiencedeveloperresearcherresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires ffmpeg installed separately via your system package manager before the pip package will work.

MIT license, use, modify, and distribute freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

Claude Real Video is a Python command-line tool that extracts the meaningful frames from a video and transcribes its audio, so you can hand that material to an AI assistant and ask questions about what is actually in the video. The problem it solves is that most AI tools cannot genuinely watch a video. When you paste a YouTube link into a chatbot, it usually reads the transcript rather than seeing the images. This tool does the visual processing on your own computer and gives you files you can then share with whatever AI you choose. The key difference from simpler approaches is how it selects frames. A naive method grabs one frame per second, which wastes context on repetitive shots from a static screencast and misses important moments in a fast-cut video. This tool detects scene changes instead, pulling a frame whenever the image meaningfully shifts. It also compares each candidate frame against the recent ones already kept and discards near-duplicates, so a shot that appears multiple times only gets included once. A 58-second clip that naive sampling would represent with 58 frames might reduce to 26 meaningfully distinct ones. Beyond frames, it optionally runs Whisper, a speech recognition tool, to produce a text transcript of the audio. If the video already has subtitle files attached, it uses those instead, which is faster and more accurate. You can also save the full audio track so a model that can process audio directly gets the actual sound rather than just the words. The output is a folder of image files, an optional transcript, and a summary file that an AI assistant can read to understand the material. You point a tool like Claude or ChatGPT at that folder and ask your questions from there. A --why flag lets you state your reason for watching, such as finding the pricing strategy in a product demo, so the summary focuses on what matters to you rather than producing a generic description. Installation requires Python 3.10 or later plus ffmpeg installed separately. The tool works from a YouTube or other public video URL or from a local file. It runs on macOS, Windows, and Linux, and is MIT licensed.

Copy-paste prompts

Prompt 1
I have a YouTube video URL and I want to let Claude analyze what happens in it visually. Walk me through using claude-real-video to extract keyframes and transcribe the audio, then show me how to reference those files in a Claude conversation.
Prompt 2
I want to install claude-real-video as a Claude Code skill. Show me the install steps and what I do after to have Claude automatically watch a video I paste.
Prompt 3
How do I adjust the scene sensitivity and dedup threshold in claude-real-video to get fewer frames from a screencast with long static sections?
Prompt 4
I have a local MP4 lecture file with English audio. Show me the crv command to extract frames, transcribe in English, and save a summary note to my Obsidian vault.

Frequently asked questions

What is claude-real-video?

A local Python tool that extracts scene-change keyframes and audio transcripts from a video so you can paste the results into Claude, ChatGPT, or any AI to ask questions about what the video shows.

What language is claude-real-video written in?

Mainly Python. The stack also includes Python, ffmpeg, yt-dlp.

What license does claude-real-video use?

MIT license, use, modify, and distribute freely for any purpose including commercial, as long as you keep the copyright notice.

How hard is claude-real-video to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is claude-real-video for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub huangchihhungleo on gitmyhub

Verify against the repo before relying on details.