Use Case

    Production RAG with Endee

    Ground your LLM in real documents. Endee retrieves the most relevant context so models give accurate, cited answers rather than hallucinations.

    LangChainLlamaIndexBM25 Hybrid SearchMetadata FilteringHigh RecallINT8 Quantization

    Capabilities

    Everything a production RAG pipeline needs

    Hybrid Search, Dense + BM25

    Combine semantic similarity with exact keyword matching in a single query. Catch both intent and terminology so users get accurate results whether they phrase a question naturally or use technical jargon. BM25 sparse vectors are generated automatically with endee-model.

    Production-grade Throughput

    Handle thousands of concurrent RAG queries on a single affordable node. Independently verified by VectorDBBench on the Cohere 10M dataset. Endee delivers the highest throughput of all tested vector databases, more than Pinecone, Qdrant, Milvus, and Vespa.

    LangChain and LlamaIndex Native

    First-class integrations for both frameworks. Drop Endee into any existing LangChain or LlamaIndex pipeline as the vector store with zero architectural changes. Official packages are available on PyPI and npm with full documentation and quickstart guides.

    Metadata Filtering

    Filter retrieved chunks by document source, date, category, or any custom field using $eq, $in, and $range operators. Filters are applied during ANN search, not post-retrieval, so you get both relevance and precision with no latency penalty for strict constraints.

    Billion-scale Index

    Scale your knowledge base to one billion chunks on a single node using adaptive quantization. INT8 reduces memory by 75% with minimal recall loss. No sharding, no cluster management, just a single Endee instance that grows with your document corpus.

    Any Embedding Model

    Endee is model-agnostic. Use OpenAI text-embedding-3-small, Cohere embed-v3, all-MiniLM-L6-v2, BAAI/bge-*, Jina, Voyage, or any custom fine-tuned encoder. The index supports any fixed-dimension dense vector from 64 to 8,000 dimensions.

    Process

    How it works

    1

    Embed your documents

    Chunk your documents and generate dense embeddings using any model. Optionally generate BM25 sparse vectors for hybrid retrieval. Store embeddings with metadata such as source, date, section, and language for precise filtering at query time.

    2

    Index in Endee

    Create an Endee index with the right precision level. Use INT8 to fit a large knowledge base in minimal RAM. Endee indexes both dense and sparse vectors in a single collection, ready for hybrid search in one API call.

    3

    Retrieve and generate

    Embed the user question and run a hybrid search with any metadata filters. Endee returns the top-k most relevant chunks in milliseconds. Pass those chunks as context to your LLM, which generates a grounded, accurate response.

    In Practice

    What teams build with RAG

    Enterprise Knowledge Base

    Let employees ask questions across thousands of internal documents, policies, and wikis.

    Customer Support AI

    Ground a support agent in product documentation, FAQs, and ticket history for accurate responses.

    Legal and Compliance

    Retrieve precise clauses from contracts and regulation documents with strict source filtering.

    Code Assistant

    Index your entire codebase and retrieve the most relevant functions and patterns for an AI code helper.