Turn a recorded meeting or lecture video into a formatted summary document automatically.
Convert a YouTube-style video into a Xiaohongshu or WeChat article for social media publishing.
Generate a subtitle file from any audio or video file without installing desktop software.
Ask follow-up questions about a video's content using the built-in AI chat mode after transcription.
Requires Docker Compose and API credentials for an AI model service (e.g. OpenAI), no other local software needed.
AI-Media2Doc is a web application that converts video and audio files into text documents using AI. You point it at a video or audio file, and it transcribes and formats the content into written form using whichever output style you select. The tool offers several output styles for the generated documents. You can produce a post formatted for Xiaohongshu (a popular Chinese social media platform), a WeChat public account article, a knowledge note, a mind map, a content summary, or a plain subtitle file. This makes it useful for content creators, students, and anyone who wants to convert recorded material into readable notes or social media posts. No account creation is required to use the tool. All task records are stored locally on your machine. The frontend handles audio extraction directly in the browser using a browser-compatible version of FFmpeg, so you do not need to install any additional local software beyond running the Docker container. The project is designed for self-hosting. Deployment is done with Docker Compose: you download the configuration file, set your API credentials for an AI model service, and start the containers with one command. The frontend is built with Vue and the backend is written in Python, both running in separate containers. Extra features include an AI chat mode that lets you ask follow-up questions about a video's transcribed content, a smart screenshot function that captures relevant frames from the video and inserts them at corresponding positions in the generated document, and support for writing custom prompts to change how the output reads. The project is open source under an MIT license.
← hanshuaikang on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.