Voicebox

Open source voice cloning powered by Qwen3-TTS. Create natural-sounding speech from text with near-perfect voice replication.

macOS (ARM) macOS (Intel) Windows Linux

View on GitHub

What is Voicebox?

Voicebox is a local-first voice cloning studio with DAW-like features for professional voice synthesis. Think of it as a local, free and open-source alternative to ElevenLabs — download models, clone voices, and generate speech entirely on your machine.

Unlike cloud services that lock your voice data behind subscriptions, Voicebox gives you complete privacy, professional tools, and native performance. Download a voice model, clone any voice from a few seconds of audio, and compose multi-voice projects with studio-grade editing tools.

Optimized for performance with Metal acceleration on Mac and CUDA acceleration on Windows/Linux for fast, local inference.

No Python install required.

See it in action...

0:00 / 0:47

Near-Perfect Voice Cloning

Powered by Alibaba's Qwen3-TTS model for exceptional voice quality and accuracy.

Stories Editor

Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations.

Multi-Sample Support

Combine multiple voice samples for higher quality and more natural-sounding results.

Local or Remote

Run GPU inference locally or connect to a remote machine. One-click server setup.

Audio Transcription

Powered by Whisper for accurate speech-to-text. Extract reference text from voice samples automatically.

Cross-Platform

Available for macOS, Windows, and Linux. No Python installation required.