Back to Calculator
Llama 4 Scout
HotMeta's first open-weight natively multimodal MoE model with 16 experts and an industry-leading 10M token context window
Model Specifications
ArchitectureVISION
Parameters109B
Familyllama
VRAM (Q4)54.5GB
Mixture of ExpertsActive inference parameters: 17B.
Fits on a single H100 GPU with int4 quantization
Share this Model
Send this model's specs directly to your community.
Similar Models
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.