Back to Calculator
Voxtral TTS 4B
HotMistral's open-weight text-to-speech model with 9-language support, voice cloning from 3s of audio, and 70ms latency
Model Specifications
ArchitectureAUDIO
Parameters4B
Familyvoxtral
VRAM (Q4)8GB
Runs on single GPU with >=16GB VRAM. Architecture: 3.4B transformer + 390M flow-matching + 300M codec
Share this Model
Send this model's specs directly to your community.
Similar Models
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.