Download a Douyin video and get a local subtitle file plus word-timed transcript for translation or repurposing.
Transcribe a Xiaohongshu video to text without installing Python, ffmpeg, or Whisper globally.
Pull metadata such as title, author, and engagement counts from a Chinese social media video to analyze content.
Only the uv Python package manager needs to be pre-installed, faster-whisper, yt-dlp, and ffmpeg are auto-installed on first run.
This is a skill for Codex and Claude Code that processes a single Douyin (the Chinese TikTok) or Xiaohongshu (RedNote) video link and saves it as local files. Given a video link, the skill downloads the video, extracts the audio, runs speech-to-text transcription locally, and produces a subtitle file plus a word-level timestamped transcript. It also saves platform metadata such as the title, description, author name, publication time, and engagement counts where the platform returns them. The skill is not a bulk scraper. It handles one link at a time and does not support downloading a creator's full channel, searching for content, or accessing private or paid content. Setup requires only the uv Python package manager to be installed first. All other dependencies, including yt-dlp for downloading, faster-whisper for transcription, imageio-ffmpeg for audio extraction, and Playwright for generating visitor cookies when needed, are declared inside the scripts themselves and installed automatically by uv on first run. No global Python, ffmpeg, or other tools need to be pre-installed. No GPU is required, though transcribing long videos on CPU will be slow. For cookie handling, the skill first attempts a bare download. If the platform requires cookies, it generates a temporary visitor-state cookie by loading the public page in an isolated browser context. It does not read your browser's saved logins, and the temporary cookie file is deleted after the task. You can also supply your own cookie file explicitly as a fallback. Output files land in a timestamped folder under outputs/. A video link produces the downloaded video, the extracted audio, the raw ASR transcript, a polished transcript generated by the AI assistant, a .srt subtitle file, and a metadata folder with manifest, report, and word-timing files. Xiaohongshu image posts are also supported and produce only the images and text description without audio processing.
← mrcarlsama on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.