explaingit

kentjuno/kjaudiobook-v1

33TypeScriptAudience · developerComplexity · 4/5Setup · hard

TLDR

An experimental local studio that turns long-form text into a narrated audiobook or video, mixing React frontend, FastAPI backend, local TTS, and a Chrome extension bridging Google Flow.

Mindmap

mindmap
  root((KJAudioBook))
    Inputs
      Markdown scripts
      Voice references
      Music and SFX
    Outputs
      Narrated audio
      Storyboarded video
      Mixed timeline
    Use Cases
      Self-narrate a novel
      Generate audiobook drafts
      Bridge Google Flow assets
    Tech Stack
      React
      FastAPI
      PyTorch
      FFmpeg
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Turn a long Markdown manuscript into a narrated audiobook locally

USE CASE 2

Mix narration, music, and SFX into a timeline with pydub and FFmpeg

USE CASE 3

Storyboard scenes from a script and tie generated visuals to audio clips

USE CASE 4

Bridge Google Flow video generation into a local audio-video pipeline

Tech stack

ReactViteFastAPIPyTorchFFmpegTypeScriptPython

Getting it running

Difficulty · hard Time to first run · 1day+

Needs Node 20.19+, Python 3.10/3.11, FFmpeg, a CUDA GPU for local TTS, and optional Gemini CLI for several endpoints.

In plain English

AudioBook KJ is an experimental studio for turning long-form text into a narrated audiobook or video project. The README is direct that this is a public source snapshot rather than a finished product, and that anyone running it should expect to adjust the code locally. Generated media, local databases, virtual environments, node modules, private voice references, and manuscript content are intentionally excluded from the repo. The README lays out seven rough workflows. Script import and cleanup pulls text in, cleans up Markdown, and splits long content into chunks, with optional rewriting through Gemini CLI. AI direction and metadata extracts characters, scenes, and storyboard hints. Text-to-speech turns lines into audio clips through the Python backend and local model tooling. An audio timeline mixes narration, music, and sound effects using pydub and FFmpeg. A visual asset workflow connects generated images or video to timeline clips. A Chrome extension called FlowKit acts as a bridge between Google Flow in the browser and the local backend. The last stage exports the assembled audio and video to a final file. The frontend is React with Vite and Tailwind CSS, using TanStack Query, React Flow, Axios, and Lucide icons. The backend is FastAPI on Uvicorn, with PyTorch, Torchaudio, and Hugging Face Transformers handling the AI and audio side. Node 20.19+ or 22.12+ is required, Python 3.10 or 3.11, and FFmpeg for export. A CUDA-capable GPU is recommended for local TTS, since the project uses Torch and OmniVoice. Gemini CLI is optional. Several helper endpoints call the gemini command directly for script cleanup, prompt enhancement, entity extraction, and storyboard generation. The README warns to use the official @google/gemini-cli npm package, not look-alikes, and notes that some calls pass --skip-trust, which the user should review before letting Gemini modify files. If Gemini CLI is missing, the main frontend still loads but those endpoints fail. The FlowKit Chrome extension lives at audiobook_builder/flowkit_extension and is loaded as an unpacked extension through Developer mode. It expects the local backend to be running, and the README is explicit that it requests broad browser permissions because it bridges local tooling with Google Flow URLs. The README advises reviewing manifest.json, background.js, and side_panel.js before pairing it with a personal Google account.

Copy-paste prompts

Prompt 1
Walk me through running KJAudioBook end to end on a Mac without a CUDA GPU and tell me which steps will fail.
Prompt 2
Audit the FlowKit Chrome extension manifest, background.js, and side_panel.js for risky permissions before I load it.
Prompt 3
Show me how to swap the OmniVoice TTS in KJAudioBook for a cloud TTS API call so I can skip the PyTorch setup.
Prompt 4
Help me wire the gemini CLI calls in KJAudioBook to a safer wrapper that does not pass --skip-trust.
Open on GitHub → Explain another repo

← kentjuno on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.