LocalOps

Reverse Engineering

Select the model you want to run. We'll tell you what hardware you need.

Llama 4 Behemoth

Text

Flagship 2T foundation model, 16 experts

Minimum (Q4)
2x RTX 4090 / A100 80GB
~4752.7GB VRAM needed
Recommended (Fast)
H100 80GB
flagshipmetaproprietary

Llama 4 Maverick

Text

High-efficiency MoE, 128 experts, 1M context

Minimum (Q4)
2x RTX 4090 / A100 80GB
~950.5GB VRAM needed
Recommended (Fast)
H100 80GB
chatmetaopen-weights

Llama 4 Scout

Text

Consumer flagship MoE, 16 experts, 10M context

Minimum (Q4)
2x RTX 4090 / A100 80GB
~259.9GB VRAM needed
Recommended (Fast)
H100 80GB
chatmetamultimodal

Mistral Large 3

Text

Granular MoE flagship, 256K context

Minimum (Q4)
2x RTX 4090 / A100 80GB
~1604.0GB VRAM needed
Recommended (Fast)
H100 80GB
flagshipmistralmultimodal

Mistral Large 3 NVFP4

Text

FP4 quantized version for NVIDIA NIM

Minimum (Q4)
2x RTX 4090 / A100 80GB
~1604.0GB VRAM needed
Recommended (Fast)
H100 80GB
flagshipmistraloptimized

Ministral 3 14B

Text

Dense edge flagship with vision

Minimum (Q4)
RTX 3090 / 4090 24GB
~34.4GB VRAM needed
Recommended (Fast)
RTX 4090 / A5000
edgemistralvision

Ministral 3 8B

Text

Balanced edge model with vision

Minimum (Q4)
RTX 4080 / 3090
~20.2GB VRAM needed
Recommended (Fast)
RTX 4090 24GB
edgemistralvision

Ministral 3 3B

Text

Lightweight mobile model with vision

Minimum (Q4)
RTX 3060 12GB
~8.6GB VRAM needed
Recommended (Fast)
RTX 4070 12GB
mobilemistralvision

Grok-3 Mini

Text

Efficient reasoning model with real-time tools

Minimum (Q4)
2x RTX 4090 / A100 80GB
~106.9GB VRAM needed
Recommended (Fast)
H100 80GB
chatxaiuncensored

Llama 3.3 70B

Text

Refined Llama 3 with superior following

Minimum (Q4)
2x RTX 4090 / A100 80GB
~168.3GB VRAM needed
Recommended (Fast)
H100 80GB
chatmetaflagship

Llama 3.2 90B Vision

Text

Multimodal with image understanding

Minimum (Q4)
2x RTX 4090 / A100 80GB
~210.6GB VRAM needed
Recommended (Fast)
H100 80GB
visionmetamultimodal

Llama 3.2 11B Vision

Text

Compact multimodal model

Minimum (Q4)
RTX 3090 / 4090 24GB
~26.2GB VRAM needed
Recommended (Fast)
RTX 4090 / A5000
visionmetaefficient

Llama 3.2 3B

Text

Mobile-optimized small model

Minimum (Q4)
RTX 3060 12GB
~8.9GB VRAM needed
Recommended (Fast)
RTX 4070 12GB
mobilemetafast

Llama 3.2 1B

Text

Ultra-light edge deployment

Minimum (Q4)
GTX 1660 6GB
~3.9GB VRAM needed
Recommended (Fast)
RTX 3060 12GB
edgemetatiny

Llama 3.1 405B

Text

Frontier-class open model. Requires datacenter hardware.

Minimum (Q4)
2x RTX 4090 / A100 80GB
~962.4GB VRAM needed
Recommended (Fast)
H100 80GB
flagshipmetadatacenter

Llama 3.1 70B

Text

Enterprise-grade intelligence

Minimum (Q4)
2x RTX 4090 / A100 80GB
~168.3GB VRAM needed
Recommended (Fast)
H100 80GB
chatmetaenterprise

Llama 3.1 8B

Text

Best small model for most tasks

Minimum (Q4)
RTX 4080 / 3090
~20.3GB VRAM needed
Recommended (Fast)
RTX 4090 24GB
chatmetapopular

Qwen 2.5 72B

Text

Top-tier reasoning and coding

Minimum (Q4)
2x RTX 4090 / A100 80GB
~173.7GB VRAM needed
Recommended (Fast)
H100 80GB
codingalibabaflagship

Qwen 2.5 32B

Text

The "Goldilocks" model - great balance

Minimum (Q4)
RTX 4090 24GB
~77.2GB VRAM needed
Recommended (Fast)
A6000 48GB / Mac Studio
balancedalibabapopular

Qwen 2.5 14B

Text

Strong mid-size model

Minimum (Q4)
RTX 3090 / 4090 24GB
~35.5GB VRAM needed
Recommended (Fast)
RTX 4090 / A5000
balancedalibaba

Qwen 2.5 7B

Text

Efficient general purpose

Minimum (Q4)
RTX 4080 / 3090
~19.6GB VRAM needed
Recommended (Fast)
RTX 4090 24GB
efficientalibaba

Qwen 2.5 3B

Text

Lightweight and fast

Minimum (Q4)
RTX 3060 12GB
~8.7GB VRAM needed
Recommended (Fast)
RTX 4070 12GB
mobilealibaba

Qwen 2.5 1.5B

Text

Edge deployment ready

Minimum (Q4)
GTX 1660 6GB
~5.3GB VRAM needed
Recommended (Fast)
RTX 3060 12GB
edgealibaba

Qwen 2.5 0.5B

Text

Smallest Qwen variant

Minimum (Q4)
GTX 1660 6GB
~2.6GB VRAM needed
Recommended (Fast)
RTX 3060 12GB
tinyalibaba

Qwen 2.5 Coder 32B

Text

State-of-the-art code generation

Minimum (Q4)
RTX 4090 24GB
~77.2GB VRAM needed
Recommended (Fast)
A6000 48GB / Mac Studio
codingalibabaflagship

Qwen 2.5 Coder 14B

Text

Strong coding in smaller package

Minimum (Q4)
RTX 3090 / 4090 24GB
~35.5GB VRAM needed
Recommended (Fast)
RTX 4090 / A5000
codingalibaba

Qwen 2.5 Coder 7B

Text

Efficient code assistant

Minimum (Q4)
RTX 4080 / 3090
~19.6GB VRAM needed
Recommended (Fast)
RTX 4090 24GB
codingalibaba

Qwen 2.5 Coder 3B

Text

Lightweight coder

Minimum (Q4)
RTX 3060 12GB
~8.7GB VRAM needed
Recommended (Fast)
RTX 4070 12GB
codingalibabafast

Qwen 3 Max (Thinking)

Text

Flagship reasoning model with "System 2" thinking mode

Minimum (Q4)
2x RTX 4090 / A100 80GB
~2851.6GB VRAM needed
Recommended (Fast)
H100 80GB
flagshipalibabareasoning

Qwen 3 235B (MoE)

Text

Open weights flagship, highly efficient experts

Minimum (Q4)
2x RTX 4090 / A100 80GB
~558.4GB VRAM needed
Recommended (Fast)
H100 80GB
flagshipalibabamoe
Showing first 30 results. Use search to find specific models.