LocalOps - Can Your GPU Run AI Models?

Reverse Engineering

Select the model you want to run. We'll tell you what hardware you need.

Llama 4 Behemoth

Text

Flagship 2T foundation model, 16 experts

Minimum (Q4)

2x RTX 4090 / A100 80GB

~4752.7GB VRAM needed

Recommended (Fast)

H100 80GB

flagshipmetaproprietary

Llama 4 Maverick

Text

High-efficiency MoE, 128 experts, 1M context

Minimum (Q4)

2x RTX 4090 / A100 80GB

~950.5GB VRAM needed

Recommended (Fast)

H100 80GB

chatmetaopen-weights

Llama 4 Scout

Text

Consumer flagship MoE, 16 experts, 10M context

Minimum (Q4)

2x RTX 4090 / A100 80GB

~259.9GB VRAM needed

Recommended (Fast)

H100 80GB

chatmetamultimodal

Mistral Large 3

Text

Granular MoE flagship, 256K context

Minimum (Q4)

2x RTX 4090 / A100 80GB

~1604.0GB VRAM needed

Recommended (Fast)

H100 80GB

flagshipmistralmultimodal

Mistral Large 3 NVFP4

Text

FP4 quantized version for NVIDIA NIM

Minimum (Q4)

2x RTX 4090 / A100 80GB

~1604.0GB VRAM needed

Recommended (Fast)

H100 80GB

flagshipmistraloptimized

Ministral 3 14B

Text

Dense edge flagship with vision

Minimum (Q4)

RTX 3090 / 4090 24GB

~34.4GB VRAM needed

Recommended (Fast)

RTX 4090 / A5000

edgemistralvision

Ministral 3 8B

Text

Balanced edge model with vision

Minimum (Q4)

RTX 4080 / 3090

~20.2GB VRAM needed

Recommended (Fast)

RTX 4090 24GB

edgemistralvision

Ministral 3 3B

Text

Lightweight mobile model with vision

Minimum (Q4)

RTX 3060 12GB

~8.6GB VRAM needed

Recommended (Fast)

RTX 4070 12GB

mobilemistralvision

Grok-3 Mini

Text

Efficient reasoning model with real-time tools

Minimum (Q4)

2x RTX 4090 / A100 80GB

~106.9GB VRAM needed

Recommended (Fast)

H100 80GB

chatxaiuncensored

Llama 3.3 70B

Text

Refined Llama 3 with superior following

Minimum (Q4)

2x RTX 4090 / A100 80GB

~168.3GB VRAM needed

Recommended (Fast)

H100 80GB

chatmetaflagship

Llama 3.2 90B Vision

Text

Multimodal with image understanding

Minimum (Q4)

2x RTX 4090 / A100 80GB

~210.6GB VRAM needed

Recommended (Fast)

H100 80GB

visionmetamultimodal

Llama 3.2 11B Vision

Text

Compact multimodal model

Minimum (Q4)

RTX 3090 / 4090 24GB

~26.2GB VRAM needed

Recommended (Fast)

RTX 4090 / A5000

visionmetaefficient

Llama 3.2 3B

Text

Mobile-optimized small model

Minimum (Q4)

RTX 3060 12GB

~8.9GB VRAM needed

Recommended (Fast)

RTX 4070 12GB

mobilemetafast

Llama 3.2 1B

Text

Ultra-light edge deployment

Minimum (Q4)

GTX 1660 6GB

~3.9GB VRAM needed

Recommended (Fast)

RTX 3060 12GB

edgemetatiny

Llama 3.1 405B

Text

Frontier-class open model. Requires datacenter hardware.

Minimum (Q4)

2x RTX 4090 / A100 80GB

~962.4GB VRAM needed

Recommended (Fast)

H100 80GB

flagshipmetadatacenter

Llama 3.1 70B

Text

Enterprise-grade intelligence

Minimum (Q4)

2x RTX 4090 / A100 80GB

~168.3GB VRAM needed

Recommended (Fast)

H100 80GB

chatmetaenterprise

Llama 3.1 8B

Text

Best small model for most tasks

Minimum (Q4)

RTX 4080 / 3090

~20.3GB VRAM needed

Recommended (Fast)

RTX 4090 24GB

chatmetapopular

Qwen 2.5 72B

Text

Top-tier reasoning and coding

Minimum (Q4)

2x RTX 4090 / A100 80GB

~173.7GB VRAM needed

Recommended (Fast)

H100 80GB

codingalibabaflagship

Qwen 2.5 32B

Text

The "Goldilocks" model - great balance

Minimum (Q4)

RTX 4090 24GB

~77.2GB VRAM needed

Recommended (Fast)

A6000 48GB / Mac Studio

balancedalibabapopular

Qwen 2.5 14B

Text

Strong mid-size model

Minimum (Q4)

RTX 3090 / 4090 24GB

~35.5GB VRAM needed

Recommended (Fast)

RTX 4090 / A5000

balancedalibaba

Qwen 2.5 7B

Text

Efficient general purpose

Minimum (Q4)

RTX 4080 / 3090

~19.6GB VRAM needed

Recommended (Fast)

RTX 4090 24GB

efficientalibaba

Qwen 2.5 3B

Text

Lightweight and fast

Minimum (Q4)

RTX 3060 12GB

~8.7GB VRAM needed

Recommended (Fast)

RTX 4070 12GB

mobilealibaba

Qwen 2.5 1.5B

Text

Edge deployment ready

Minimum (Q4)

GTX 1660 6GB

~5.3GB VRAM needed

Recommended (Fast)

RTX 3060 12GB

edgealibaba

Qwen 2.5 0.5B

Text

Smallest Qwen variant

Minimum (Q4)

GTX 1660 6GB

~2.6GB VRAM needed

Recommended (Fast)

RTX 3060 12GB

tinyalibaba

Qwen 2.5 Coder 32B

Text

State-of-the-art code generation

Minimum (Q4)

RTX 4090 24GB

~77.2GB VRAM needed

Recommended (Fast)

A6000 48GB / Mac Studio

codingalibabaflagship

Qwen 2.5 Coder 14B

Text

Strong coding in smaller package

Minimum (Q4)

RTX 3090 / 4090 24GB

~35.5GB VRAM needed

Recommended (Fast)

RTX 4090 / A5000

codingalibaba

Qwen 2.5 Coder 7B

Text

Efficient code assistant

Minimum (Q4)

RTX 4080 / 3090

~19.6GB VRAM needed

Recommended (Fast)

RTX 4090 24GB

codingalibaba

Qwen 2.5 Coder 3B

Text

Lightweight coder

Minimum (Q4)

RTX 3060 12GB

~8.7GB VRAM needed

Recommended (Fast)

RTX 4070 12GB

codingalibabafast

Qwen 3 Max (Thinking)

Text

Flagship reasoning model with "System 2" thinking mode

Minimum (Q4)

2x RTX 4090 / A100 80GB

~2851.6GB VRAM needed

Recommended (Fast)

H100 80GB

flagshipalibabareasoning

Qwen 3 235B (MoE)

Text

Open weights flagship, highly efficient experts

Minimum (Q4)

2x RTX 4090 / A100 80GB

~558.4GB VRAM needed

Recommended (Fast)

H100 80GB

flagshipalibabamoe

Showing first 30 results. Use search to find specific models.