Back to CalculatorDeploy Now
GPT OSS 20B
HotOpenAI's compact open-weight MoE reasoning model, matches o3-mini on benchmarks, runs in 16GB VRAM. First major open-weight release from OpenAI.
Specifications
SourceArchitectureTEXT
Parameters21B
Familygpt-oss
VRAM (Q4)16G
MoE: 3.6B active.
Apache 2.0. MXFP4 quantized. Configurable reasoning effort (low/medium/high). 128K context. Runs on 16GB GPU.
openaireasoningmoeefficientapache2trending
Build your Local Rig
Ready to run locally? Shop top-tier GPUs on Amazon for the best performance.
Instant Cloud GPUs
Running out of VRAM? Rent a high-end H100 or RTX 4090 on RunPod and deploy in seconds.
Quantization Estimates
| Format | VRAM Need | Tier |
|---|---|---|
| FP16 | 42.0 GB | Full Precision |
| Q8_0 | 21.0 GB | High |
| Q6_K | 17.8 GB | Excellent |
| Q5_K_M | 14.7 GB | Great |
| Q4_K_M | 10.5 GB | Sweet Spot |
| Q2_K | 6.3 GB | Emergency |
Share this Model
Send these specs directly to your community.