What are embeddings?

Meaning encoded as numbers

An embedding is a list of numbers (a vector) that captures the meaning of a piece of data. Think of it like GPS coordinates: just as two nearby coordinates represent two nearby locations on Earth, two nearby embedding vectors represent two pieces of content that are similar in meaning.

The key property of embeddings is that semantic similarity corresponds to mathematical closeness. "Dog" and "puppy" produce embeddings that are close together. "Dog" and "interest rate" produce embeddings that are far apart. This makes embeddings the foundation of any AI system that needs to find, compare, or group data by what it means rather than what it literally says.

How embedding models are built and trained

An embedding model is an AI system trained to produce these numerical representations. Training works by showing the model many examples of similar and dissimilar items. The model learns to move similar items' embeddings closer together and push dissimilar items' embeddings further apart. After training on billions of examples, the model generalizes: it can produce accurate embeddings for sentences, documents, or images it has never seen before.

Well-known text embedding models include OpenAI's text-embedding-3 series, Cohere's Embed v3, and various open-source models from Hugging Face. All of them produce different-length vectors (typically 256 to 4096 numbers) with slightly different strengths, but they all work on the same principle.

Embeddings for images, audio, and more

Embeddings are not limited to text. Image embedding models convert photos into vectors where visually similar images land near each other. Audio models produce vectors for sounds and music. Code embedding models convert program code into vectors for code search. Multimodal models (like CLIP from OpenAI) project both text and images into the same shared space, making it possible to search image databases using text descriptions.

A vector database stores all of these representations the same way: as fixed-length lists of numbers. This means one database can hold text embeddings, image embeddings, and audio embeddings at the same time, and the search mechanism works identically for all of them.

Meaning encoded as numbers

How embedding models are built and trained

Embeddings for images, audio, and more

Related concepts

Vector Database

Semantic Search

Dense vs Sparse Vectors

RAG

Distance Metrics

Multimodal Retrieval

Put Embeddings to work with Endee