Qwen3 Embedding 4B
Mid-size Qwen3 embedding model at 4B parameters — strong multilingual retrieval with lower VRAM requirements than the 8B. Apache 2.0, 32K context.
Model Specifications
Share this Model
Send this model's specs directly to your community.
Similar Models
Qwen3 Embedding 8B
8BAlibaba's flagship embedding model — #1 on MTEB multilingual leaderboard (score 70.58). 8B decoder-only transformer, 32K context, flexible 32–4096 dimension output. Outperforms all dedicated encoder models on 100+ language retrieval tasks.
Qwen3 Embedding 0.6B
0.6BUltra-compact Qwen3 embedding model — 0.6B parameters, runs on CPU or any GPU. Ideal for edge RAG pipelines and low-latency local search with Apache 2.0 license.
Qwen3 Reranker 8B
8BAlibaba's top cross-encoder reranker at 8B parameters — state-of-the-art on multilingual text retrieval benchmarks. Instruction-aware for task-specific ranking. Apache 2.0.
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.