What is RAG and why does it need a vector database?

RAG (Retrieval-Augmented Generation) grounds LLM responses in external documents. A vector database stores embeddings of those documents and retrieves the most semantically similar chunks at query time, so the LLM answers from real data rather than hallucinating.

Does Endee support hybrid search for RAG?

Yes. Endee supports hybrid search combining dense vector embeddings with BM25 sparse vectors. This improves recall when users query with exact terminology alongside semantic intent, which is critical for technical documentation and product catalogs.

Can I use Endee with LangChain or LlamaIndex?

Yes. Endee has official integrations with LangChain (as a VectorStore) and LlamaIndex (as a VectorStoreIndex). You can swap Endee in with a few lines of code in any existing RAG pipeline.

Use Case

Production RAG with Endee

Ground your LLM in real documents. Endee retrieves the most relevant context so models give accurate, cited answers rather than hallucinations.

Start for free LangChain docs

LangChainLlamaIndexBM25 Hybrid SearchMetadata FilteringHigh RecallINT8 Quantization

Capabilities

Everything a production RAG pipeline needs

Hybrid Search, Dense + BM25

Combine semantic similarity with exact keyword matching in a single query. Catch both intent and terminology so users get accurate results whether they phrase a question naturally or use technical jargon. BM25 sparse vectors are generated automatically with endee-model.

Production-grade Throughput

Handle thousands of concurrent RAG queries on a single affordable node. Independently verified by VectorDBBench on the Cohere 10M dataset. Endee delivers the highest throughput of all tested vector databases, more than Pinecone, Qdrant, Milvus, and Vespa.

LangChain and LlamaIndex Native

First-class integrations for both frameworks. Drop Endee into any existing LangChain or LlamaIndex pipeline as the vector store with zero architectural changes. Official packages are available on PyPI and npm with full documentation and quickstart guides.

Metadata Filtering

Filter retrieved chunks by document source, date, category, or any custom field using $eq, $in, and $range operators. Filters are applied during ANN search, not post-retrieval, so you get both relevance and precision with no latency penalty for strict constraints.

Billion-scale Index

Scale your knowledge base to one billion chunks on a single node using adaptive quantization. INT8 reduces memory by 75% with minimal recall loss. No sharding, no cluster management, just a single Endee instance that grows with your document corpus.

Any Embedding Model

Endee is model-agnostic. Use OpenAI text-embedding-3-small, Cohere embed-v3, all-MiniLM-L6-v2, BAAI/bge-*, Jina, Voyage, or any custom fine-tuned encoder. The index supports any fixed-dimension dense vector from 64 to 8,000 dimensions.

Process

How it works

Embed your documents

Chunk your documents and generate dense embeddings using any model. Optionally generate BM25 sparse vectors for hybrid retrieval. Store embeddings with metadata such as source, date, section, and language for precise filtering at query time.

Index in Endee

Create an Endee index with the right precision level. Use INT8 to fit a large knowledge base in minimal RAM. Endee indexes both dense and sparse vectors in a single collection, ready for hybrid search in one API call.

Retrieve and generate

Embed the user question and run a hybrid search with any metadata filters. Endee returns the top-k most relevant chunks in milliseconds. Pass those chunks as context to your LLM, which generates a grounded, accurate response.

In Practice

What teams build with RAG

Enterprise Knowledge Base

Let employees ask questions across thousands of internal documents, policies, and wikis.

Customer Support AI

Ground a support agent in product documentation, FAQs, and ticket history for accurate responses.

Legal and Compliance

Retrieve precise clauses from contracts and regulation documents with strict source filtering.

Code Assistant

Index your entire codebase and retrieve the most relevant functions and patterns for an AI code helper.

Related resources

What is RAG

Blog post

Benchmarks

Performance data

Semantic Search

Use case

AI Agent Memory

Use case