LocalOps
Back to Calculator

Nemotron Ultra 253B

Hot

NVIDIA's reasoning-optimized model derived from Llama 3.1 405B via Neural Architecture Search, with toggleable reasoning mode

Model Specifications

ArchitectureTEXT
Parameters253B
Familynemotron
VRAM (Q4)126.5GB
Requires 8x H100 for inference. Reasoning ON/OFF controlled via system prompt
#nvidia#reasoning#llama-based#trendingSource

Estimated Quantization Sizes

FormatPrecisionEst. VRAMRecommendation
FP16 / BF1616-bit506.0 GBUncompressed Base
Q8_0High8-bit253.0 GBNear Lossless
Q6_K6-bit189.8 GBExcellent Balance
Q4_K_MPopular4-bit126.5 GBStandard Use

Share this Model

Send this model's specs directly to your community.

Post