Qwen3.6 35B-A3B
First open-weight Qwen3.6 model. 35B total / 3B active MoE, focused on agentic coding and repo-level reasoning. Native 262K context, extensible to 1M tokens. Apache 2.0.
Model Specifications
Estimated Quantization Sizes
| Format | Precision | Est. VRAM | Recommendation |
|---|---|---|---|
| FP16 / BF16 | 16-bit | 70.0 GB | Uncompressed Base |
| Q8_0High | 8-bit | 35.0 GB | Near Lossless |
| Q6_K | 6-bit | 26.3 GB | Excellent Balance |
| Q4_K_MPopular | 4-bit | 17.5 GB | Standard Use |
Share this Model
Send this model's specs directly to your community.
Similar Models
Qwen3.6 Plus
0BAlibaba's next-gen hybrid-architecture flagship, released as a free preview on OpenRouter (March 31 2026). Always-on chain-of-thought, 1M token context, up to 65K output tokens — built for agentic coding and long-document workflows.
Llama 3.3 70B
70.55BRefined Llama 3 with superior following
Llama 3.2 3B
3.21BMobile-optimized small model
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.