kat3ri/comfyui-dramabox

Analysis updated 2026-06-24

★ 15PythonAudience · vibe coderComplexity · 3/5Setup · hard

Mindmap

mindmap
  root((ComfyUI-DramaBox))
    Inputs
      Quoted dialogue text
      Stage directions
      Reference voice clip
    Outputs
      Spoken audio waveform
      Compatible with Save Audio node
    Use Cases
      Voice acting prototypes
      Scene narration
      Voice cloning
    Tech Stack
      Python
      ComfyUI
      CUDA
      HuggingFace
      PyTorch

mindmap root((ComfyUI-DramaBox)) Inputs Quoted dialogue text Stage directions Reference voice clip Outputs Spoken audio waveform Compatible with Save Audio node Use Cases Voice acting prototypes Scene narration Voice cloning Tech Stack Python ComfyUI CUDA HuggingFace PyTorch

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Generate expressive dialogue audio inside a ComfyUI workflow from a written script

USE CASE 2

Clone a voice from a 10-second reference clip and use it in a ComfyUI scene

USE CASE 3

Add laughs, sighs, and emotion cues to TTS output for short animation or game prototypes

What is it built with?

PythonComfyUICUDAPyTorchHuggingFace

How does it compare?

	kat3ri/comfyui-dramabox	13127905/deep-learning-based-air-gesture-text-recognition-	6xvl/paralives-plugins-index
Stars	15	15	15
Language	Python	Python	Python
Setup difficulty	hard	moderate	easy
Complexity	3/5	3/5	2/5
Audience	vibe coder	developer	general

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Needs a 24GB NVIDIA GPU, CUDA 12+, 17GB disk for model weights, and the first run downloads weights from HuggingFace.

In plain English

ComfyUI-DramaBox is a small add-on for ComfyUI, the popular browser-based workflow tool used for image and audio generation. It wraps a text-to-speech model called DramaBox, made by ResembleAI, so that ComfyUI users can generate spoken audio from typed scene descriptions without leaving the app. The DramaBox model itself is described as expressive: it does not just read text in a flat voice, it can produce laughs, sighs, pauses, voice cracks, and other dramatic moments based on cues in the prompt. It also supports voice cloning, meaning if you upload a short reference clip of about ten seconds, the generated speech will try to match that speaker's voice. The output comes out as standard ComfyUI audio that can be sent to the Preview Audio or Save Audio nodes already in ComfyUI. The hardware bar is high. You need an NVIDIA GPU with about 24 GB of video memory, CUDA 12 or newer, and about 17 GB of free disk space for the model files. Installation is either through the ComfyUI Manager by searching for the node name, or by cloning the repository into the custom_nodes folder and running pip install. The first time you generate audio, the node downloads the model weights from HuggingFace by itself: a transformer file, an audio components file, and a 4-bit version of a Gemma 3 text encoder. The prompt format is unusual. Anything inside quotes is what the model speaks aloud, including phonetic laughs like Hahaha. Anything outside quotes is treated as stage direction that shapes how the next line is delivered, such as She sighs deeply or His voice rises with fury. The README warns that the first generation takes several minutes while models load into memory, but later runs on an H100 GPU take only about two or three seconds.

Copy-paste prompts

Prompt 1

Install ComfyUI-DramaBox on my ComfyUI setup and route its output to the Save Audio node

Prompt 2

Write a DramaBox prompt for a 3-line argument scene using stage directions and quoted dialogue

Prompt 3

Build a ComfyUI workflow that pairs ComfyUI-DramaBox voice output with a video generator on the same scene text

Prompt 4

Lower the VRAM use of ComfyUI-DramaBox below 24GB by enabling offloading or quantization where possible

Frequently asked questions

What is comfyui-dramabox?

ComfyUI custom node that wraps ResembleAI DramaBox, an expressive text-to-speech model with voice cloning, stage directions, laughs, and sighs.

What language is comfyui-dramabox written in?

Mainly Python. The stack also includes Python, ComfyUI, CUDA.

How hard is comfyui-dramabox to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is comfyui-dramabox for?

Mainly vibe coder.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.