Qwen3 Embedding 0.6B

Ultra-compact Qwen3 embedding model — 0.6B parameters, runs on CPU or any GPU. Ideal for edge RAG pipelines and low-latency local search with Apache 2.0 license.

Model Specifications

ArchitectureEMBEDDING

Parameters0.6B

Familyqwen3-embed

VRAM (Q4)0.3GB

Works as a dense retrieval backbone for reranking pipelines. Can be combined with Qwen3-Reranker-0.6B for full RAG stack.

#alibaba#qwen#embedding#retrieval#rag#edge#apache2#efficientSource

Share this Model

Send this model's specs directly to your community.

Post

Similar Models

Qwen3 Embedding 8B

Alibaba's flagship embedding model — #1 on MTEB multilingual leaderboard (score 70.58). 8B decoder-only transformer, 32K context, flexible 32–4096 dimension output. Outperforms all dedicated encoder models on 100+ language retrieval tasks.

Qwen3 Embedding 4B

Mid-size Qwen3 embedding model at 4B parameters — strong multilingual retrieval with lower VRAM requirements than the 8B. Apache 2.0, 32K context.

Qwen3 Reranker 8B

Alibaba's top cross-encoder reranker at 8B parameters — state-of-the-art on multilingual text retrieval benchmarks. Instruction-aware for task-specific ranking. Apache 2.0.

Qwen3 Embedding 0.6B

Model Specifications

Share this Model

Similar Models

Qwen3 Embedding 8B

Qwen3 Embedding 4B

Qwen3 Reranker 8B

Related Guides

How much VRAM do you really need?

Best GPUs for Machine Learning in 2026

GGUF vs EXL2 vs AWQ