Qwen3 Embedding 0.6B
Ultra-compact Qwen3 embedding model — 0.6B parameters, runs on CPU or any GPU. Ideal for edge RAG pipelines and low-latency local search with Apache 2.0 license.
Model Specifications
Share this Model
Send this model's specs directly to your community.
Similar Models
Qwen3 Embedding 8B
8BAlibaba's flagship embedding model — #1 on MTEB multilingual leaderboard (score 70.58). 8B decoder-only transformer, 32K context, flexible 32–4096 dimension output. Outperforms all dedicated encoder models on 100+ language retrieval tasks.
Qwen3 Embedding 4B
4BMid-size Qwen3 embedding model at 4B parameters — strong multilingual retrieval with lower VRAM requirements than the 8B. Apache 2.0, 32K context.
Qwen3 Reranker 8B
8BAlibaba's top cross-encoder reranker at 8B parameters — state-of-the-art on multilingual text retrieval benchmarks. Instruction-aware for task-specific ranking. Apache 2.0.
Related Guides
How much VRAM do you really need?
A complete breakdown of quantization levels and VRAM overhead for running local models.
Best GPUs for Machine Learning in 2026
Comparing NVIDIA and AMD options for the best speed-to-dollar ratio.
GGUF vs EXL2 vs AWQ
Understanding local AI formats and which one to pick for your specific hardware.