Back to CalculatorDeploy Now
Qwen3.5-Omni Light
Alibaba's compact open-weight omni-modal model. Handles text, image, audio, and video in a single inference pass — self-hostable on HuggingFace under Apache 2.0.
Specifications
SourceArchitectureAUDIO
Parameters7B
Familyqwen3.5
VRAM (Q4)3.5G
Only open-weight variant in the Qwen3.5-Omni family. vLLM recommended for inference.
alibabaqwenmultimodalaudiovideoedgeomniapache2
Build your Local Rig
Ready to run locally? Shop top-tier GPUs on Amazon for the best performance.
Instant Cloud GPUs
Running out of VRAM? Rent a high-end H100 or RTX 4090 on RunPod and deploy in seconds.
Share this Model
Send these specs directly to your community.
Similar Models
Qwen3.5-Omni Plus
30BAlibaba's flagship omni-modal model — processes text, images, audio, and video natively. Thinker-Talker MoE architecture with real-time streaming speech output, 256K context, 113 speech recognition languages.
Qwen3-TTS CustomVoice (1.7B)
1.7BFew-shot voice cloning with style control
CosyVoice 2 (0.5B)
0.5BStreaming speech synthesis foundation