ComfyUI-DramaBox adds LoRA fine-tuning for LTX text-to-speech voice cloning
A new ComfyUI custom node for DramaBox, an experimental text-to-speech model built on LTX, now supports LoRA fine-tuning alongside a standalone dataset-prep tool that cuts long audio clips into training samples.
ComfyUI-DramaBox, a custom node for an experimental LTX-based text-to-speech model, now supports LoRA fine-tuning, pairing with a focused fork of Voice-Clone-Studio that automates dataset preparation from raw audio.
DramaBox is an LTX-derived TTS model released this week. Developer FranckyB built ComfyUI-DramaBox as a custom node shortly after, and the latest update added LoRA training capability. The node reads from models/dramabox in the standard ComfyUI directory structure, letting users share weights between the node and the standalone tool.
Voice-Clone-Studio-DramaBox is a stripped-down fork of the developer's existing Voice-Clone-Studio TTS toolkit, keeping only Qwen-TTS for voice design. Its Prep Sample tab splits a single long audio file into phrase-level clips with automatic transcription—useful for building training datasets. The developer reports better results with 10 clips of 5–10 seconds each than with larger 80-clip datasets. DramaBox is described as "very prone to hallucination," which is why it remains separate from the main Voice-Clone-Studio release; the use case is explicitly experimental rather than production-ready.
Both tools are available on GitHub. Users running the standalone app can point it at their ComfyUI model folder to avoid duplicating weights.