LocalOps LogoLocalOps
Back to Calculator

Nemotron Ultra 253B

Hot

NVIDIA's reasoning-optimized model derived from Llama 3.1 405B via Neural Architecture Search, with toggleable reasoning mode

Specifications

Source
ArchitectureTEXT
Parameters253B
Familynemotron
VRAM (Q4)126.5G
Requires 8x H100 for inference. Reasoning ON/OFF controlled via system prompt
nvidiareasoningllama-basedtrending

Build your Local Rig

Ready to run locally? Shop top-tier GPUs on Amazon for the best performance.

Instant Cloud GPUs

Running out of VRAM? Rent a high-end H100 or RTX 4090 on RunPod and deploy in seconds.

Deploy Now

Quantization Estimates

FormatVRAM NeedTier
FP16506.0 GBFull Precision
Q8_0253.0 GBHigh
Q6_K215.0 GBExcellent
Q5_K_M177.1 GBGreat
Q4_K_M126.5 GBSweet Spot
Q2_K75.9 GBEmergency

Share this Model

Send these specs directly to your community.

Post