Langswap open-sources video dubbing pipeline combining Whisper, Gemma, and OmniVoice

Langswap released its full production video translation pipeline on GitHub, combining speech separation, Whisper ASR, Gemma-4-E2B translation, and OmniVoice synthesis to automate multilingual dubbing.

ByAlex Sokoloff·June 11, 2026

Langswap open-sources video dubbing pipeline combining Whisper, Gemma, and OmniVoice

Langswap released its full production video translation pipeline on GitHub this week, opening the code that powers its dubbing service at langswap.app. The system automates multilingual video translation by separating speech from background audio, translating dialogue with length-matching constraints, and synthesizing new speech that preserves the original speaker's characteristics.

The workflow splits incoming audio into speech and background sound, preserving music and ambient noise untouched. Whisper handles speech recognition, with voice activity detection refining segment boundaries and assigning speaker labels. Translation runs through Gemma-4-E2B with a vowel-count check — if the translated text diverges too far in length from the source, the model retries once or twice to tighten the match. This constraint prevents the dubbed audio from falling out of sync with lip movements.

Pipeline components

OmniVoice generates the new speech, using the original audio segment as a reference for prosody and speaker characteristics. The system then swaps the audio track back into the video and stamps a watermark indicating Langswap translation. The codebase includes modules for speech-to-text management, ASR with VAD, translation via llama.cpp, OmniVoice TTS, and FFmpeg video assembly.

The maintainer notes that earlier iterations carried more proprietary logic around text-to-speech systems and length control, but the current release strips most of that complexity to simplify contributions and lower the barrier to forking or extending the pipeline. The repository is live at github.com/langswap-app/langswap.

ZenCreator

Langswap open-sources video dubbing pipeline combining Whisper, Gemma, and OmniVoice

Pipeline components

More in Releases

PAJAMA distills LLM judges into programs, cuts eval cost by 100×

Molt: NVIDIA's PyTorch framework cuts agentic RL iteration cost

Hypernetworks outscale LoRA for train-time knowledge injection in LLMs

Staleness-Adaptive Trust Region cuts asynchronous RL performance loss to 3% at 8× policy lag

Distilled RL transfers knowledge across model families without unconditional imitation