System Configuration
Configure your hardware to check model compatibility
Isaac GR00T N1
Roboticsgr00tNVIDIA humanoid foundation model
HumanPlus
RoboticshumanplusShadows human motion to robot
SwinIR
ImageswinirTransformer image restoration
Real-ESRGAN
ImageesrganImage super-resolution
Piper TTS
AudiopiperUltra-fast local TTS
YOLOv10
VisionyoloLatest YOLO model
BGE Small EN
EmbeddingbgeFast English embeddings
GFP-GAN
ImagegfpganFace restoration
Whisper Tiny
AudiowhisperMinimal ASR model
YOLOv8 L
VisionyoloLarge YOLO variant
YOLOv11 X
ImageyoloLatest YOLO object detection flagship
CodeFormer
ImagecodeformerRobust face restoration
YOLOv8 X
VisionyoloObject detection flagship
BRIA RMBG 2.0
ImagebriaNext-gen background removal, video support
Kokoro 82M
AudiokokoroHighest quality lightweight TTS
Wav2Vec2 Base
Audiowav2vecCompact ASR
SDXS-512
ImagesdTiny real-time model (0.1s latency)
Octo Base
RoboticsoctoOpen source robot manipulation policy
MaskGCT
AudiomaskgctZero-shot TTS/Voice conversion
VITS
AudiovitsFast variational inference TTS
PaddleOCR V4
VisionpaddleocrMultilingual OCR
DWPose
VisiondwposeHuman pose estimation
IP-Adapter Plus
Imageip-adapterImage prompt adapter
IP-Adapter FaceID
Imageip-adapterFace ID preservation
BRIA RMBG
ImagebriaCommercial background removal
E5 Base V2
Embeddinge5Balanced embeddings
LayoutLMv3
VisionlayoutlmDocument understanding
SmolLM2 135M
TextsmollmSmallest practical language model
Nomic Embed Text v1.5
EmbeddingnomicLong context embeddings (8192)
SpeechT5 TTS
Audiospeecht5Unified speech/text model
ModernBERT Base
EmbeddingbertFast and accurate
StyleTTS 2
AudiostylettsFast expressive TTS
RemBG
ImagerembgBackground removal
Moonshine Base
AudiomoonshineFast ASR optimized for resource constrained
LGM
3DlgmHigh-res Gaussian splatting
Allegro
VideoallegroRhymes AI open video model
Donut Base
VisiondonutOCR-free document understanding
EasyOCR
VisioneasyocrEasy multilingual OCR
OpenPose
VisionopenposeMulti-person pose detection
BiRefNet
ImagebirefnetBilateral reference high-resolution segmentation
SAM 2
VisionsamVideo segmentation
SAM 2.1 Large
ImagesamSegment Anything in images and video, improved accuracy
Whisper Small
AudiowhisperFast ASR
Marvis TTS 250M
AudiomarvisReal-time streaming voice cloning
Gemma 3 270M
Textgemma3Ultra-compact on-device model
F5-TTS
Audiof5Zero-shot voice cloning
CosyVoice
AudiocosyvoiceAlibaba's multilingual TTS
MeloTTS
AudiomeloMultilingual fast TTS
MusicGen Small
AudiomusicgenFast music generation
Wav2Vec2 Large
Audiowav2vecSelf-supervised ASR
mxbai-embed-large
EmbeddingmxbaiMixed bread AI embeddings
BGE Large EN
EmbeddingbgeEnglish embeddings
GTE Large
EmbeddinggteGeneral text embeddings
E5 Large V2
Embeddinge5Contrastive embeddings
Depth Anything V2 Large
Visiondepth-anythingMonocular depth estimation
Depth Anything V2 Large
Imagedepth-anythingHigh-quality monocular depth estimation
ZoeDepth
VisionzoedepthMetric depth estimation
SmolLM2 360M
TextsmollmNanoscale efficient model
ModernBERT Large
Embeddingbert8k context, modern architecture
ModernBERT Embed Large
TextmodernbertModernized BERT for efficient retrieval, 8K context
AnimateDiff
VideoanimatediffTurn any SD model into video
Stella v5 400M
EmbeddingstellaSOTA commercial-friendly embedding
Orpheus 400M
AudioorpheusEfficient TTS
SigLIP SO400M
VisionsiglipGoogle improved CLIP
Stella EN 400M
EmbeddingstellaEfficient embeddings
Chatterbox Turbo
AudiochatterboxLow-latency high-performance TTS
CLIP ViT-L/14
VisionclipVision-language alignment
Qwen 2.5 0.5B
TextqwenSmallest Qwen variant
CosyVoice 2 (0.5B)
AudiocosyvoiceStreaming speech synthesis foundation
CosyVoice 2 Instruct
AudiocosyvoiceFine-grained emotional control
XTTS v2
AudioxttsHigh quality voice cloning
InstantMesh
3DinstantmeshFast image to 3D mesh
TripoSR
3DtriposrSingle image to 3D
MeshAnything V2
3Dmesh-anythingArtist-created mesh alignment
Chatterbox
AudiochatterboxNatural voice cloning TTS
SmolVLM 500M
TextsmolvlmUltra-compact vision-language model
Spark TTS 0.5B
Audiospark-ttsControllable TTS with voice cloning, emotion and speed control
StableFast3D V2
3Dsf3dRapid single-image 3D mesh generation
PuLID FLUX
ImagepulidPure identity face insertion for FLUX
BGE Reranker Large
EmbeddingbgeReranking model
Multilingual E5 Large
Embeddinge5100+ language embeddings
TrOCR Large
VisiontrocrTransformer OCR
BGE-M3
EmbeddingbgeMulti-lingual multi-granularity
Arctic Embed L v2
TextarcticMultilingual embedding model, 8K context
Jina Embeddings v3
EmbeddingjinaTask-specific embeddings
Qwen 3 0.6B
TextqwenMicro model for embedded systems
Qwen3-TTS Base (0.6B)
Audioqwen-ttsUltra-low latency streaming TTS (<97ms)
PixArt-Σ
ImagepixartEfficient DiT architecture
PixArt-α
ImagepixartOriginal PixArt model
Parakeet v2 0.6B
AudioparakeetUltra-fast ASR, 60min in 1sec, word timestamps
SAM
VisionsamSegment Anything Model
ChatTTS
AudiochatttsConversational TTS with laughter/pauses
Fish Speech 1.4
AudiofishHighly expressive TTS
ControlNet Canny
ImagecontrolnetEdge-guided generation
ControlNet Depth
ImagecontrolnetDepth-guided generation
ControlNet OpenPose
ImagecontrolnetPose-guided generation
Segmind Vega
ImagesegmindDistilled SDXL - 70% faster
Distil-Whisper Large
AudiowhisperDistilled for speed
Florence 2 Large
VisionflorenceMicrosoft vision foundation
Whisper Medium
AudiowhisperBalanced ASR model
Whisper Large v3 Turbo
AudiowhisperFast high-quality ASR
Tortoise TTS
AudiotortoiseHigh quality but slower TTS
VALL-E
AudiovalleNeural codec language model TTS
IC-Light V2
Imageic-lightRelighting images with controllable illumination
Whisper V3 Turbo FT
AudiowhisperFine-tuned turbo whisper for specialized domains
Stable Diffusion 1.5
ImagesdCommunity favorite, massive ecosystem
SD 1.5 Inpainting
ImagesdStandard inpainting model
Instruct-Pix2Pix
ImagesdEdit images via text instructions
Riffusion
AudioriffusionStable Diffusion for music
Stable Diffusion 2.1 (768)
ImagesdNative 768px generation
Stable Diffusion 2.1 Base
ImagesdNative 512px generation
SD 2.0 Depth
ImagesdStructure preservation via depth map
Parler TTS
AudioparlerDescribe voice with text
DeepFloyd IF L
ImagedeepfloydMid-tier cascaded model
Stable Fast 3D
3Dstable3dSingle image to 3D in 0.5s
Stable Point Aware 3D
3Dstable3dView-consistent 3D generation
ECMWF AIFS
ScienceaifsOperational AI weather forecasting
NOAA AIGFS
SciencegraphcastAI Global Forecast System
TRELLIS
3DtrellisStructured 3D asset generation
OpenELM 1B
TextopenelmLightweight Apple LLM
Bark
AudiobarkMulti-lingual with sound effects
Orpheus 1B
AudioorpheusBalanced TTS model
Würstchen
ImagewuerstchenEfficient latent diffusion
Point-E
3Dpoint-ePoint cloud generation
Zero-1-to-3
3Dzero123Single image to 3D views
Gemma 3 1B
Textgemma3Lightweight text-only, 32K context
Falcon 3 1B
Textfalcon3Ultra-light deployment
StarCoder2 1B
TextstarcoderCompact code completion
OuteTTS 1.0 1B
AudiooutettsOpen TTS with pure LLM approach, voice cloning
TRELLIS Large
3DtrellisScalable 3D generation with structured latents, large variant
Stable Audio Open
Audiostable-audio47s stereo audio generation (44.1kHz)
Fish Speech 1.5
Audiofish-speech1M+ hours multilingual TTS
NeMo Parakeet
AudionemoNVIDIA ASR model
DINOv2 Giant
VisiondinoSelf-supervised vision
MetaVoice 1B
AudiometavoiceEmotion and prosody control
AudioLDM 2
AudioaudioldmText-to-audio generation
Llama 3.2 1B
TextllamaUltra-light edge deployment
Wan 2.1 1.3B
VideowanEfficient consumer video gen (480p native)
Aurora v2
ScienceauroraMicrosoft's atmospheric foundation model
SSD-1B
Imagesegmind50% smaller SDXL, 60% faster
Shap-E
3Dshap-eOpenAI 3D generation
DeepSeek R1 Distill 1.5B
TextdeepseekTiny reasoning for edge
Hunyuan-DiT
ImagehunyuanTencent DiT foundation model
Stella EN 1.5B
EmbeddingstellaState-of-the-art embeddings
Instructor XL
EmbeddinginstructorInstruction-based embeddings
MusicGen Medium
AudiomusicgenBalanced music generation
MusicGen Melody
AudiomusicgenMelody-conditioned generation
AudioLDM 2 Large
AudioaudioldmHigh-quality audio gen
Qwen 2.5 Coder 1.5B
TextqwenUltra-compact coding model
GTE Qwen2 1.5B
TextgteHigh-quality text embeddings, 8K context
Qwen 2.5 1.5B
TextqwenEdge deployment ready
Whisper Large v3
AudiowhisperBest open ASR model
Moondream1
VisionmoondreamOriginal Moondream
Dia TTS 1.6B
AudiodiaHyper-realistic dialogue TTS, emotions, voice cloning
Qwen 3 1.7B
TextqwenOn-device assistant specialist
Qwen3-TTS VoiceDesign (1.7B)
Audioqwen-ttsZero-shot voice design from text descriptions
Qwen3-TTS CustomVoice (1.7B)
Audioqwen-ttsFew-shot voice cloning with style control
SmolLM 1.7B
TextsmollmHugging Face small model
SmolLM2 1.7B
TextsmollmCapable tiny model for constrained devices
Moondream2
VisionmoondreamTiny vision-language model
SD 3 Medium
ImagesdFirst open weight MMDiT model
Lumina-Next-SFT
ImageluminaEfficient high-res generation
Granite 3.0 2B
TextgraniteCompact robust model
Granite 3.3 2B
TextgraniteCompact enterprise model
InternVL3 2B
TextinternvlCompact vision model
SmolVLM 2B
TextsmolvlmTiny but capable vision-language model
Lumina Image 2.0
ImageluminaUnified multimodal generation framework
LTX-Video 0.9.7
VideoltxFast real-time video gen, 30fps
Qwen2-VL 2B
VisionqwenCompact vision model
SD 3.5 Medium
ImagesdEfficient balanced model for consumer GPUs
Canary Qwen 2.5B
AudiocanaryMultilingual ASR with Qwen backbone
Gemma 2 2B
TextgemmaLightweight deployment
Mamba-2 2.7B
TextmambaPure State Space Model (SSM)
Ministral 3 3B
TextmistralLightweight mobile model with vision
RedNote OCR
VisionrednoteCharacter recognition specialist
Granite 3.0 3B
TextgraniteSmall enterprise model
OpenELM 3B
TextopenelmApple open language model
StableLM 3B
TextstablelmEfficient chat model
Stable Code 3B
TextstablelmCode generation specialist
StarCoder2 3B
TextstarcoderSmall code specialist
RedPajama INCITE 3B
TextredpajamaSmaller open model
Orpheus 3B
AudioorpheusLlama-based TTS flagship
Falcon 3 3B
Textfalcon3Compact edge model
SmolLM3 3B
TextsmollmMultilingual, dual-mode thinking, 128K context
Voxtral Mini 3B
AudiovoxtralFast speech-to-text, 13 languages
Qwen 2.5 VL 3B
TextqwenCompact vision-language model
Open-Sora 2.0
Videoopen-soraOpen-source Sora replica, 720p, many modes
Hunyuan3D 2.0
3Dhunyuan3dHigh-res textured 3D asset generation from images/text
Qwen 2.5 3B
TextqwenLightweight and fast
Qwen 2.5 Coder 3B
TextqwenLightweight coder
Llama-3.2 3B Abliterated
TextllamaSafety guardrails removed
Llama 3.2 3B
TextllamaMobile-optimized small model
Yue Music V2
AudioyueAI music composition with lyrics and genre control
Phi-4 Mini
TextphiMath-optimized compact model, mobile-ready
OmniGen V1
ImageomnigenUnified image generation without extra modules
Phi-3.5 Mini (3.8B)
TextphiHigh IQ for its size
FLUX.2 [klein] 4B
ImagefluxUltra-fast edge/laptop model
H2O Danube 3 4B
TextdanubeMobile-first efficient model
InternVL2 4B
VisioninternvlCompact vision model
Gemma 3 4B
Textgemma3Compact multimodal, 128K context
Nemotron Mini 4B
TextnemotronCompact edge model optimized for NVIDIA GPUs
Qwen 3 4B
TextqwenHigh performance mobile model
Kolors
ImagekolorsChinese bilingual image generation
SDXL Base 1.0
ImagesdThe gold standard for fine-tuning
SDXL Turbo
ImagesdReal-time single-step generation
SDXL Lightning
Imagesd2-step and 4-step distilled UNet
SDXL Inpainting
ImagesdDedicated inpainting specialist
CosXL
ImagesdInstruction-editing fine-tune
Playground v2.5
ImageplaygroundAesthetic-focused generation
AuraFlow v0.3
ImageauraflowRectified Flow open source generator
Animagine XL 3.1
ImageanimagineAnime specialist SDXL
Animagine XL 3.0
ImageanimagineAnime image generation
Dreamshaper 8
ImagedreamshaperVersatile SDXL variant
SDXL Lightning 4-Step
ImagesdUltra-fast 4-step generation
SDXL Lightning 2-Step
ImagesdFastest 2-step variant
BioMistral 7B
SciencemistralMedical adaptation of Mistral
Pyramid Flow 7B
VideopyramidEfficient pyramidal flow matching
NV-Embed-v2
Embeddingnv-embedTop MTEB leaderboard performer
LLaVA 1.5 7B
VisionllavaEfficient vision-language
Janus Pro 7B
ImagejanusUnified understanding and generation model
MiniCPM-V 2.6
VisionminicpmStrong OCR and multimodal features
RFM-1
RoboticsrfmCovariant physics world model
Llama-3.1 Omni 8B
AudiollamaLow latency speech interaction
Xiaomi VLM
VisionxiaomiVision language model
Voxtral Small 8B
AudiovoxtralHigh-accuracy multilingual transcription
FramePack F1
VideoframepackFits long video gen in 6GB VRAM via progressive packing
FLUX.2 [klein] 9B
ImagefluxEfficient consumer GPU model
AlphaFold 3
SciencealphafoldPredicts protein/DNA/RNA structures
Med-Gemini 2
SciencegeminiMultimodal medical flagship
Open-Sora 1.2
VideoopensoraSora reproduction
MARS5 TTS
Audiomars5Prosody-focused voice cloning
SVD XT 1.1
VideosvdOptimized Image-to-Video (25 frames)
SVD
VideosvdBase Image-to-Video (14 frames)
Stable Video Diffusion XT
VideosvdImage-to-video with extended frames
Wonder3D
3Dwonder3dSingle image to 3D mesh
ModelScope Text2Video
VideomodelscopeAlibaba video generation
ZeroScope V2
VideozeroscopeWatermark-free video gen
LTX-Video
VideoltxLightricks video gen
CogVideoX 2B
VideocogvideoEfficient video gen
VideoCrafter1
VideovideocrafterOriginal VideoCrafter
SeamlessM4T
AudioseamlessMultilingual translation model
YuE Music
AudioyueChinese music generation
VideoCrafter2
VideovideocrafterText and image to video
LaVie
VideolavieHigh-quality video synthesis
Craftsman
3DcraftsmanText/image to 3D generation
MusicGen Large
AudiomusicgenText-to-music generation
DeepFloyd IF XL
ImagedeepfloydCascaded diffusion model
Wan 2.2 TI2V 5B
VideowanUnified T2V/I2V efficiency model
CogVideoX 5B
VideocogvideoText-to-video generation
Gemma 3n E2B
Textgemma3nUltra-light multimodal, 2B effective memory footprint
CogVideoX 1.5 5B
VideocogvideoImproved video gen with 10s 720p output
CogVideoX 1.5 5B I2V
VideocogvideoImage-to-video with 10s output
Stable Cascade
Imagecascade3-stage compression architecture (C+B)
Phi-4 Multimodal
TextphiVision + speech multimodal small model
ChatGLM3 6B
TextglmBilingual chat model from Tsinghua
ChatGLM2 6B
TextglmSecond generation GLM chat
Yi 1.5 6B
TextyiBilingual general-purpose
CogView4
ImagecogviewHigh-resolution text-to-image, Chinese + English, up to 2048x2048
Pythia 6.9B
TextpythiaMid-size Pythia model
Qwen2.5 Audio Instruct
AudioqwenVoice chat and audio analysis
Qwen2 Audio 7B
AudioqwenAudio understanding foundation
xLAM 7B FC
TextxlamOptimized for efficient function calling
Gorilla OpenFunctions v2
TextgorillaDeepSeek-based API calling specialist
RWKV-6 World 7B
TextrwkvRNN with Transformer-level performance
SeaLLM v3 7B
TextseallmSoutheast Asia languages specialist
DeepSeek Prover V1.5
TextdeepseekTheorem proving specialist
InternLM 2.5 7B
TextinternlmEfficient bilingual model
Baichuan2 7B
TextbaichuanEfficient Chinese LLM
OLMo 2 7B
TextolmoEfficient fully open model
Command R 7B
TextcommandEfficient RAG specialist
Falcon 7B
TextfalconCompact Falcon model
StarCoder2 7B
TextstarcoderEfficient code model
CodeGen2.5 7B
TextcodegenSalesforce code model
XGen 7B
TextxgenLong sequence model
RedPajama INCITE 7B
TextredpajamaOpen reproduction model
Hermes 2 Pro Mistral 7B
TexthermesFunction calling specialist
LLaVA 1.6 Mistral 7B
VisionllavaMistral-based vision model
O1 Mini Distill
Texto1Distilled reasoning model
Falcon 3 7B
Textfalcon3General purpose with multimodal support
DeepSeek R1 7B
TextdeepseekDistilled reasoning model, Qwen-based
OLMo 3 7B
TextolmoFully open-source with training data and logs
MiMo 7B
TextmimoCompact reasoning model from Xiaomi
Qwen 2.5 Math 7B
TextqwenSpecialized mathematical reasoning
MAP-Neo 7B
Textmap-neoFully open-source bilingual (EN/ZH) model
E5-Mistral 7B
Embeddinge5LLM-based embeddings
SFR Embedding Mistral
EmbeddingsfrMistral-based embeddings
Molmo 7B
VisionmolmoHighly efficient vision
Mistral 7B v0.3
TextmistralClassic efficient model
Mathstral 7B
TextmistralMath and STEM specialist
Phi-3 Small (7B)
TextphiEfficient general purpose
Eagle 7B
TextrwkvRWKV-5 based efficient attention-free
Qwen 2.5 7B
TextqwenEfficient general purpose
Qwen 2.5 Coder 7B
TextqwenEfficient code assistant
GTE-Qwen2 7B
EmbeddinggteAlibaba LLM embeddings
Qwen2-VL 7B
VisionqwenEfficient Qwen vision
EXAONE 3.0 7.8B
TextexaoneLG AI's bilingual English/Korean
Ministral 3 8B
TextmistralBalanced edge model with vision
DeepSeek R1 Distill 8B
TextdeepseekSmall but capable reasoner
Aya Expanse 8B
TextcommandCompact multilingual
Granite 3.0 Guardian
TextgraniteIBM risk detection & safety
Dolphin 2.9 Llama 3
TextdolphinPopular uncensored fine-tune
Skywork Reward
TextskyworkReward model for RLHF
Granite Code 8B
TextgraniteCode specialist from IBM
InternVL2 8B
VisioninternvlEfficient multimodal
Fuyu 8B
VisionfuyuAdept multimodal model
Idefics2 8B
VisionideficsHuggingFace vision-language
Gemma 3n E4B
Textgemma3nOn-device multimodal (text/image/audio/video), 3B effective memory
Granite 3.3 8B
TextgraniteEnterprise with speech and vision
InternLM3 8B
TextinternlmAdvanced reasoning and long-context, deep thinking
MiniCPM-o 2.6
TextminicpmOmni-modal: text, image, video, audio, live streaming
SD 3.5 Large ControlNet
Imagesd3Controlled generation with canny/depth/blur
Llama 3.1 8B
TextllamaBest small model for most tasks
Qwen 3 8B
TextqwenUniversal edge model, MCP native
Granite 3.0 8B
TextgraniteIBM enterprise-grade robust model
Ministral 8B
TextmistralEdge-focused powerful Mistral
SD 3.5 Large
ImagesdFlagship MMDiT architecture, superior prompt adherence
SD 3.5 Large Turbo
ImagesdDistilled 4-step generation version of Large
HunyuanVideo 1.5
VideohunyuanState-of-art open video gen, 720p, t2v + i2v
CodeGemma 7B
TextgemmaCode-focused Gemma
Yi 1.5 9B
TextyiEfficient bilingual
Yi Coder 9B
TextyiCode specialist
RecurrentGemma 9B
TextgemmaGriffin-based RNN-Transformer
CodeGeex4 9B
TextcodegeexMultilingual code generation
GLM-4 Voice
TextglmEnd-to-end speech chatbot, emotion control
Gemma 2 9B
TextgemmaEfficient mid-size
GLM-4 9B
TextglmTsinghua bilingual model
Mochi 1 Preview
VideomochiGenmo's video model
Ideogram V3
ImageideogramText-in-image specialist
Ideogram V3 Turbo
ImageideogramFast text rendering
Pyramid Flow
VideopyramidAutoregressive video diffusion
Falcon 3 10B
Textfalcon3Enhanced science, math, and coding
Llama 3.2 11B Vision
TextllamaCompact multimodal model
SOLAR 10.7B
TextsolarDepth-upscaled model
Solar Mini
TextsolarCompact depth-upscaled
Falcon 11B
TextfalconEfficient Falcon variant
Kandinsky 3
ImagekandinskyMultilingual text-to-image
Mistral Nemo 12B
TextmistralCompact and capable
FLUX.1 [kontext]
ImagefluxSpecialized in-context editing & consistency
FLUX1.1 [pro] Ultra
Imageflux4MP Raw/Ultra modes, API only
FLUX1.1 [pro]
Imageflux6x faster than 1.0, superior prompt adherence
FLUX.1 [pro]
ImagefluxOriginal flagship API model
FLUX.1 [dev]
ImagefluxSOTA open weights image generator
FLUX.1 [schnell]
ImagefluxFastest 4-step distilled FLUX
FLUX.1 Fill [dev]
ImagefluxInpainting/outpainting specialist
FLUX.1 Canny [dev]
ImagefluxStructure guidance via Canny edges
FLUX.1 Depth [dev]
ImagefluxStructure guidance via Depth maps
FLUX.1 Redux [dev]
ImagefluxImage mixing and variation adapter
NVIDIA Cosmos 1 XL
VideocosmosPhysical world foundation model
Pythia 12B
TextpythiaResearch model suite
OASST Pythia 12B
TextoasstOpen Assistant model
Gemma 3 12B
Textgemma3Balanced multimodal model, 128K context
Jamba Mini
TextjambaMamba architecture, 256K context
FLUX.1 Tools
ImagefluxSuite of editing tools (fill, depth, canny, redux)
Pixtral 12B
VisionmistralMistral multimodal native
NexusRaven V2 13B
TextnexusravenZero-shot tool use specialist
Fugaku-LLM 13B
TextfugakuJapanese scientific model
Seed LLM
TextseedByteDance research model
Skywork 13B
TextskyworkOpen bilingual model
Baichuan2 13B
TextbaichuanOpen Chinese foundation model
OLMo 3 13B
TextolmoMid-size truly open model
LLaVA 1.5 13B
VisionllavaMid-size vision-language
OLMo 2 13B
TextolmoFully open research model
Ministral 3 14B
TextmistralDense edge flagship with vision
DeepSeek R1 Distill 14B
TextdeepseekCompact reasoning model
Phi-4 (14B)
TextphiLatest Phi with exceptional reasoning
Phi-3 Medium (14B)
TextphiBalanced Phi model
Wan 2.1 I2V 14B (480P)
VideowanStable low-res image animation
Xiaomi 14B
TextxiaomiXiaomi edge flagship
Phi-4 Reasoning Plus
TextphiChain-of-thought reasoning model
InternVL3 14B
TextinternvlEfficient multimodal understanding
SkyReels V2 I2V
VideoskyreelsInfinite-length video with camera control
Cosmos 1 Video 14B
VideocosmosPhysical world simulation video model
Qwen 2.5 14B
TextqwenStrong mid-size model
Qwen 2.5 Coder 14B
TextqwenStrong coding in smaller package
Qwen 3 14B
TextqwenPerfect mid-range daily driver
StarCoder2 15B
TextstarcoderCode specialist
StarCoder Base
TextstarcoderFoundation code model
FLUX.2 [max]
ImagefluxFlagship professional model (2026 SOTA)
FLUX.2 [dev]
ImagefluxOpen-weight research flagship
MOSS
TextmossFirst open Chinese conversational LLM
CogVLM 17B
VisioncogvlmOriginal CogVLM
HiDream I1 Full
ImagehidreamHigh-quality text-to-image with 4 LLM backbone
HiDream I1 Fast
Imagehidream16-step fast generation variant
CogVLM2 19B
VisioncogvlmPowerful vision-language
LTX-Video 2
VideoltxIntegrated audio-video gen, native 4K 50fps
Recraft V3
ImagerecraftHigh-quality realistic generations
Recraft V3 SVG
ImagerecraftVector generation specialist
InternVL2 26B
VisioninternvlMultimodal flagship
Qwen 3 Omni
AudioqwenEnd-to-end voice/text/vision interaction
RT-2-X
RoboticsrtGoogle VLA (Vision-Language-Action)
HunyuanVideo
VideohunyuanTencent SOTA open video generation
SkyReels V1
VideoskyreelsHuman-centric cinematic video
Wan 2.1 14B
VideowanCinema-quality generation (Supports 720p)
Wan 2.1 I2V 14B (720P)
VideowanHigh-res image animation flagship
Wan 2.2 T2V A14B
VideoMoEwanMoE-powered high fidelity (2x14B Experts)
Wan 2.2 I2V A14B
VideoMoEwanMoE-powered image animation
InternLM 2.5 20B
TextinternlmStrong Chinese/English
Kimi K2
TextkimiMultimodal with 128K context from Moonshot
Kimi K1.5
TextkimiLong context specialist
InternLM2 20B Chat
TextinternlmPowerful Chinese/English LLM
Codestral 22B
TextmistralMistral's code specialist
Solar Pro
TextsolarEnterprise Solar model
Codestral 25.01
TextmistralUpdated code generation flagship
Mistral Small (24B)
TextmistralEfficient enterprise model
Gemma 3 27B
Textgemma3Flagship multimodal, 128K context, 140+ languages
Gemma 2 27B
TextgemmaGoogle's best open model
Qwen 3 30B (MoE)
TextMoEqwenPunching way above its weight class
DeepSeek R1 Distill 32B
TextdeepseekEfficient reasoning distillation
Aya Expanse 32B
TextcommandMultilingual specialist
OLMo 3 32B
TextolmoFully open-source with training data
Marco-o1
TextmarcoOpen reasoning model
GLM-4.7 Thinking
TextglmAdvanced reasoning with thinking mode
Qwen 2.5 VL 32B
TextqwenAdvanced vision-language understanding
Qwen 2.5 32B
TextqwenThe "Goldilocks" model - great balance
Qwen 2.5 Coder 32B
TextqwenState-of-the-art code generation
Qwen 3 32B
TextqwenDense SOTA for its size category
Qwen 3 Coder 32B
TextqwenSelf-correcting code specialist
QwQ 32B Preview
TextqwenQwen reasoning model (o1-like)
DeepSeek Coder 33B
TextdeepseekStrong dense code model
Agent Coder 33B
TextdeepseekSelf-correcting coding agent
WhiteRabbitNeo 33B
TextwhiterabbitCybersecurity offensive/defensive spec
Nous Capybara 34B
TextnousConversational expert
Yi 1.5 34B
TextyiStrong bilingual model
LLaVA 1.6 34B
VisionllavaState-of-the-art vision-language
Command R (35B)
TextcommandRetrieval-optimized
Aya 23 35B
TextayaCohere's massive multilingual model
Falcon 40B
TextfalconMid-size Falcon model
Phi-3.5 MoE (42B)
TextMoEphiEfficient mixture of experts
Grok-3 Mini
TextgrokEfficient reasoning model with real-time tools
Mixtral 8x7B (MoE)
TextMoEmistralPopular efficient MoE
Yuan 2.0 51B
TextyuanMid-size Yuan variant
Jamba v0.1
TextMoEjambaMamba-Transformer Hybrid
Jamba 1.5 Mini
TextMoEjambaEfficient hybrid architecture
DeepSeek R1 Distill 70B
TextdeepseekDistilled reasoning model
Nemotron-4 70B
TextnemotronNVIDIA RLHF aligned model
Hermes 3 70B
TexthermesUncensored agentic Llama 3.1 tune
Functionary V3 Medium
TextfunctionaryMeetKai's agentic control model
Llama 3.3 70B
TextllamaRefined Llama 3 with superior following
Llama 3.1 70B
TextllamaEnterprise-grade intelligence
Qwen 3 VL 72B
VisionqwenVisual reasoning powerhouse
Molmo 72B
VisionmolmoAllenAI open state-of-the-art
Qwen 2.5 Math 72B
TextqwenMath-specific reasoning model
NuminaMath 72B
TextnuminaWinner of AI Math Olympiad
Kimi K2.5
TextkimiTop-tier coding and reasoning, open weights
Kimi-Dev 72B
TextkimiSpecialized software development model
Qwen 2.5 72B
TextqwenTop-tier reasoning and coding
Qwen2-VL 72B
VisionqwenAlibaba multimodal flagship
InternVL3 78B
TextinternvlFrontier multimodal understanding
Llama 3.2 90B Vision
TextllamaMultimodal with image understanding
ESM3 Open
ScienceesmSimulate & generate biology
Ernie X1
TexternieFirst Baidu reasoning model
SenseTime XL
TextsensetimeEnterprise multimodal model
Yuan 2.0
TextyuanLarge scale Chinese model
Command R+ (104B)
TextcommandEnterprise RAG and tool use
GLM-4.5 Air
TextMoEglmEfficient MoE variant
Llama 4 Scout
TextMoEllama4Consumer flagship MoE, 16 experts, 10M context
Command A
TextcommandAgentic enterprise model, 256K context
TeleChat2 115B
TexttelechatChina Telecom massive model
GPT-OSS 120B
Textgpt-ossOpenAI first open-source model, fits single 80GB GPU
Mistral Large 2 (123B)
TextmistralFlagship Mistral model
Pixtral Large 124B
TextpixtralFlagship vision model with 128K context
DBRX Instruct
TextMoEdbrxDatabrick's powerful MoE
DBRX Base
TextMoEdbrxFoundation MoE from Databricks
Mixtral 8x22B (MoE)
TextMoEmistralLarge scale MoE
xLAM 8x22B
TextMoExlamSalesforce "Large Action Model" flagship
Mixtral 8x22B DPO
TextMoEmistralDPO-tuned Mixtral
Falcon 180B
TextfalconLarge scale open model
Ernie Bot 4
TexterniePrevious generation flagship
MiniMax M2
TextminimaxCapable chat model with strong performance
Ernie 4.5
TexternieLatest multimodal foundational model
Qwen 3 235B (MoE)
TextMoEqwenOpen weights flagship, highly efficient experts
DeepSeek Coder V2 236B (MoE)
TextMoEdeepseekExpert code generation MoE
MiMo-V2 Flash
TextMoEmimoUltra-fast reasoning MoE, 256K context
Grok-1
TextMoEgrokxAI massive open model
Nemotron-4 340B
TextnemotronSynthetic data generation flagship
GLM-4.6
TextMoEglmLatest Zhipu flagship MoE model
GLM-4.5
TextMoEglmAdvanced open-source MoE from Zhipu
Hunyuan Large
TextMoEhunyuanTencent flagship MoE model
Jamba 1.5 Large
TextMoEjambaHybrid Transformer-Mamba, 256k context
Llama 4 Maverick
TextMoEllama4High-efficiency MoE, 128 experts, 1M context
Llama 3.1 405B
TextllamaFrontier-class open model. Requires datacenter hardware.
MiniMax Text-01
TextMoEminimax1M context window with hybrid attention
MiniMax M1
TextMoEminimaxLightning reasoning model, hybrid thinking
DeepSeek V3 (MoE)
TextMoEdeepseekMassive MoE - exceptional performance
DeepSeek R1 671B (MoE)
TextMoEdeepseekReasoning specialist with o1-level performance
DeepSeek R1 Zero
TextMoEdeepseekPure RL reasoning without supervised fine-tuning
Mistral Large 3
TextMoEmistralGranular MoE flagship, 256K context
Mistral Large 3 NVFP4
TextMoEmistralFP4 quantized version for NVIDIA NIM
DeepSeek V3.1
TextMoEdeepseekUpgraded V3 with improved reasoning
Qwen 3 Max (Thinking)
TextMoEqwenFlagship reasoning model with "System 2" thinking mode
Llama 4 Behemoth
TextMoEllama4Flagship 2T foundation model, 16 experts
Polaris 3.0
SciencepolarisHippocratic AI's medical constellation
VRAM Bottleneck Detected
Many models are running with RAM offloading. An upgrade to 16GB+ VRAM would significantly improve performance.