Back to Calculator
GPT OSS 20B
HotOpenAI's compact open-weight MoE reasoning model, matches o3-mini on benchmarks, runs in 16GB VRAM. First major open-weight release from OpenAI.
Model Specifications
ArchitectureTEXT
Parameters21B
Familygpt-oss
VRAM (Q4)16GB
Mixture of ExpertsActive inference parameters: 3.6B.
Apache 2.0. MXFP4 quantized. Configurable reasoning effort (low/medium/high). 128K context. Runs on 16GB GPU.
Estimated Quantization Sizes
| Format | Precision | Est. VRAM | Recommendation |
|---|---|---|---|
| FP16 / BF16 | 16-bit | 42.0 GB | Uncompressed Base |
| Q8_0High | 8-bit | 21.0 GB | Near Lossless |
| Q6_K | 6-bit | 15.8 GB | Excellent Balance |
| Q4_K_MPopular | 4-bit | 10.5 GB | Standard Use |
Share this Model
Send this model's specs directly to your community.
Similar Models
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.