The AI's short-term memory
Every AI language model has a limit to how much text it can process in one go. This limit is called the context window and it is measured in tokens (roughly, one token equals about three quarters of a word in English). Everything that the AI reads to generate its response must fit within this window: the conversation history, the user's question, any documents you want it to reference, and the instructions you gave it.
Today's models have large context windows: GPT-4 can handle roughly 300 pages of text at once, and some models support even more. But even the largest context window cannot hold an entire company knowledge base, a years-long email archive, or a library of product documentation.
Why vector databases solve the context window problem
Rather than stuffing everything into the context window and hoping the AI finds what it needs, vector databases let you retrieve only the relevant pieces and inject those. When a user asks a question, the vector database searches the entire knowledge base in milliseconds and returns only the 3 to 10 most relevant passages. Those passages are placed into the context window alongside the question.
This approach works at any scale. Whether the knowledge base contains 1,000 documents or 1 million, the vector search finds the right passages and the context window stays manageable. The quality of the final AI answer depends directly on the quality of retrieval: if the vector database finds the right passages, the AI has everything it needs to give an accurate answer.
Context windows and AI agent memory
AI agents (systems that take actions, browse the web, call tools, and carry out multi-step tasks) face an additional challenge: their context window resets between sessions. An agent cannot remember a conversation it had last week unless that information is stored somewhere persistent.
Vector databases serve as long-term memory for agents. Notes, observations, tool results, and prior conversation summaries can all be stored as embeddings. At the start of a new session, the agent retrieves its most relevant memories and loads them into the context window. This gives AI agents a form of continuity across conversations that would otherwise be impossible within the boundaries of any context window, no matter how large.