Back to Calculator
Qwen3.5-Omni Light
Alibaba's compact open-weight omni-modal model. Handles text, image, audio, and video in a single inference pass — self-hostable on HuggingFace under Apache 2.0.
Model Specifications
ArchitectureAUDIO
Parameters7B
Familyqwen3.5
VRAM (Q4)3.5GB
Only open-weight variant in the Qwen3.5-Omni family. vLLM recommended for inference.
Share this Model
Send this model's specs directly to your community.
Similar Models
Qwen3.5-Omni Plus
30BAlibaba's flagship omni-modal model — processes text, images, audio, and video natively. Thinker-Talker MoE architecture with real-time streaming speech output, 256K context, 113 speech recognition languages.
Qwen 3 Omni
30BEnd-to-end voice/text/vision interaction
Qwen3-TTS VoiceDesign (1.7B)
1.7BZero-shot voice design from text descriptions
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.