Analysis updated 2026-05-18
Turn a screen-recording of a software walkthrough into a written how-to guide with embedded screenshots.
Generate onboarding documentation from a recorded demo without manually writing each step.
Create a PDF tutorial from a training video to share with non-technical teammates.
| lxfater/video-to-doc-stepfun | adam-s/car-diagnosis | bongobongo2020/krea2-character-lora-trainer | |
|---|---|---|---|
| Stars | 8 | 8 | 8 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | moderate | moderate |
| Complexity | 3/5 | 3/5 | 3/5 |
| Audience | developer | researcher | vibe coder |
Figures from each repo's GitHub metadata at analysis time.
Requires ffmpeg installed on the system and a StepFun platform API key.
This tool takes a screen-recording or tutorial video and automatically converts it into a step-by-step written guide with screenshots embedded. The output is a Markdown document and a PDF, both structured around the actions visible in the video. The default workflow uses subtitles to drive the process. A local Whisper model listens to the audio and produces a transcript with timestamps. An AI model from StepFun (step-3.7-flash) reads those subtitles, figures out which actions are happening and when, and picks the best frame to capture as a screenshot for each step. The tool then uses ffmpeg to pull those frames out of the video. For steps the AI is less sure about, it sends the screenshot back to the AI for a second look and a corrected description. You can also turn on a mode where the full video is uploaded to the AI and it watches the screen directly, rather than working from audio alone. That approach costs more in API calls but can catch steps that have no spoken narration. A web-search option is available to supplement the final document with additional context from the internet. The project includes a small web application so you can do everything through a browser. You upload a video, watch the processing stages in real time, then edit the draft document in a split-screen editor (Markdown on the left, preview on the right) before exporting the PDF. The backend is FastAPI, the frontend is React, and they communicate via a server-sent-events stream for live progress updates. Setup requires Python, ffmpeg installed on your system, and an API key from the StepFun platform. The configuration is minimal: copy the example environment file, paste in your key, and run the script. The README is in Chinese, but the code and configuration options are straightforward. All outputs land in a folder named after the original video file.
Converts tutorial or screen-recording videos into step-by-step guides with screenshots, Markdown, and PDF output using Whisper transcription and an AI model to identify actions.
Mainly Python. The stack also includes Python, FastAPI, React.
MIT license: use freely for any purpose, including commercial projects, as long as you keep the copyright notice.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.