Dense vectors: encoding meaning
A dense vector has a non-zero number in nearly every dimension. A 768-dimensional text embedding, for example, has 768 numbers and almost all of them carry some information. These numbers are produced by AI models that have learned to place similar-meaning content near each other in this numerical space. The word "car" and the word "automobile" get dense vectors that are very close together, even though the words are completely different.
Dense vectors are what most people mean when they talk about "AI embeddings." They capture context, intent, and meaning. They handle synonyms, paraphrases, and even cross-language queries. A search for "fast" can return results about "quick," "rapid," and "speedy" without those words appearing in the query.
Sparse vectors: encoding keywords
A sparse vector is mostly zeros. Each dimension represents one possible word from a vocabulary (which might contain 100,000 words). A given document's vector has non-zero values only for the words it actually contains, typically a few hundred out of 100,000. This is how traditional search engines like BM25 work.
Sparse vectors are excellent for exact term matching. If someone searches for a specific product code like "X2-PRO-USB-C" or a legal citation, dense search might miss it or dilute it among similar but different items. Sparse search finds the exact string reliably. Domain-specific jargon, model numbers, and proper nouns are all handled well by sparse vectors.
Why the best systems use both
Dense and sparse vectors complement each other perfectly. Dense vectors understand meaning but sometimes miss exact terms. Sparse vectors match exact terms but miss synonyms and paraphrases.
Imagine a customer searching for "comfortable running shoes." Dense search understands this means athletic footwear for jogging and returns relevant results even if the catalog uses different words. But if the same customer searches for a specific model number like "NB-880-BK-US10," dense search may dilute the result with similar-looking codes. Sparse search returns the exact match immediately.
This is why hybrid search, which combines both types, consistently outperforms either approach alone across diverse query types.