Use Case
Production RAG with Endee
Ground your LLM in real documents. Endee retrieves the most relevant context so models give accurate, cited answers rather than hallucinations.
Capabilities
Everything a production RAG pipeline needs
Hybrid Search, Dense + BM25
Combine semantic similarity with exact keyword matching in a single query. Catch both intent and terminology so users get accurate results whether they phrase a question naturally or use technical jargon. BM25 sparse vectors are generated automatically with endee-model.
Production-grade Throughput
Handle thousands of concurrent RAG queries on a single affordable node. Independently verified by VectorDBBench on the Cohere 10M dataset. Endee delivers the highest throughput of all tested vector databases, more than Pinecone, Qdrant, Milvus, and Vespa.
LangChain and LlamaIndex Native
First-class integrations for both frameworks. Drop Endee into any existing LangChain or LlamaIndex pipeline as the vector store with zero architectural changes. Official packages are available on PyPI and npm with full documentation and quickstart guides.
Metadata Filtering
Filter retrieved chunks by document source, date, category, or any custom field using $eq, $in, and $range operators. Filters are applied during ANN search, not post-retrieval, so you get both relevance and precision with no latency penalty for strict constraints.
Billion-scale Index
Scale your knowledge base to one billion chunks on a single node using adaptive quantization. INT8 reduces memory by 75% with minimal recall loss. No sharding, no cluster management, just a single Endee instance that grows with your document corpus.
Any Embedding Model
Endee is model-agnostic. Use OpenAI text-embedding-3-small, Cohere embed-v3, all-MiniLM-L6-v2, BAAI/bge-*, Jina, Voyage, or any custom fine-tuned encoder. The index supports any fixed-dimension dense vector from 64 to 8,000 dimensions.
Process
How it works
Embed your documents
Chunk your documents and generate dense embeddings using any model. Optionally generate BM25 sparse vectors for hybrid retrieval. Store embeddings with metadata such as source, date, section, and language for precise filtering at query time.
Index in Endee
Create an Endee index with the right precision level. Use INT8 to fit a large knowledge base in minimal RAM. Endee indexes both dense and sparse vectors in a single collection, ready for hybrid search in one API call.
Retrieve and generate
Embed the user question and run a hybrid search with any metadata filters. Endee returns the top-k most relevant chunks in milliseconds. Pass those chunks as context to your LLM, which generates a grounded, accurate response.
In Practice
What teams build with RAG
Enterprise Knowledge Base
Let employees ask questions across thousands of internal documents, policies, and wikis.
Customer Support AI
Ground a support agent in product documentation, FAQs, and ticket history for accurate responses.
Legal and Compliance
Retrieve precise clauses from contracts and regulation documents with strict source filtering.
Code Assistant
Index your entire codebase and retrieve the most relevant functions and patterns for an AI code helper.