LocalOps LogoLocalOps
Back to Calculator

Qwen3 Embedding 8B

Hot

Alibaba's flagship embedding model — #1 on MTEB multilingual leaderboard (score 70.58). 8B decoder-only transformer, 32K context, flexible 32–4096 dimension output. Outperforms all dedicated encoder models on 100+ language retrieval tasks.

Specifications

Source
ArchitectureEMBEDDING
Parameters8B
Familyqwen3-embed
VRAM (Q4)4.0G
Use with vLLM or TGI for high-throughput serving. MRL training allows safe truncation to 1024 dims at ~5% quality cost. Best pick for multilingual RAG.
alibabaqwenembeddingretrievalragmultilingualapache2trending

Build your Local Rig

Ready to run locally? Shop top-tier GPUs on Amazon for the best performance.

Instant Cloud GPUs

Running out of VRAM? Rent a high-end H100 or RTX 4090 on RunPod and deploy in seconds.

Deploy Now

Share this Model

Send these specs directly to your community.

Post