Back to CalculatorDeploy Now
Qwen3.5-Omni Plus
HotAlibaba's flagship omni-modal model — processes text, images, audio, and video natively. Thinker-Talker MoE architecture with real-time streaming speech output, 256K context, 113 speech recognition languages.
Specifications
SourceArchitectureAUDIO
Parameters30B
Familyqwen3.5
VRAM (Q4)15.0G
MoE: 3B active.
Plus (30B-A3B) and Flash variants are API-only via DashScope as of March 31 2026. Weights not yet confirmed publicly.
alibabaqwenmultimodalaudiovideomoerealtimeomnitrending
Build your Local Rig
Ready to run locally? Shop top-tier GPUs on Amazon for the best performance.
Instant Cloud GPUs
Running out of VRAM? Rent a high-end H100 or RTX 4090 on RunPod and deploy in seconds.
Share this Model
Send these specs directly to your community.
Similar Models
Qwen3.5-Omni Light
7BAlibaba's compact open-weight omni-modal model. Handles text, image, audio, and video in a single inference pass — self-hostable on HuggingFace under Apache 2.0.
Qwen3-TTS CustomVoice (1.7B)
1.7BFew-shot voice cloning with style control
CosyVoice 2 (0.5B)
0.5BStreaming speech synthesis foundation