Resources

    Core Technology

    Four proprietary innovations, each a purpose-built answer to a real production limitation in existing vector databases.

    ProprietaryProduction-provenBenchmarked independently

    The Vector Graph Engine

    The Engine Behind the Numbers

    1

    Multi-Layer Memory Architecture

    Hot/warm/cold tiers keep frequently-accessed vectors in RAM and the rest on fast NVMe. Maximises throughput without over-provisioning memory.

    Enables 1B+ vectors on a single node

    2

    Efficient Quantization

    Scalar quantization (FP32 to INT8) compresses each vector 4x with near-zero recall loss. Endee quantizes at index-build time so queries always hit compressed vectors.

    1/10th the RAM, same 99%+ recall

    Endee

    powered by

    Vector
    Graph
    Engine

    Proprietary
    architecture

    3

    SIMD Acceleration

    CPU vector instructions (AVX-512, NEON) compute 16+ dot-products per clock cycle. Distance calculations run at hardware speed, no GPU, no special infrastructure needed.

    <5ms P99 latency at 10K+ QPS

    4

    Efficient Graph Construction

    Endee builds the HNSW graph incrementally with pruning and degree-limiting heuristics. Index builds 2-4x faster than naive HNSW, enabling real-time catalog updates.

    Live catalog sync without index rebuilds

    Four compounding innovations. Each one eliminates a bottleneck. Together, they deliver an unfair advantage.

    The Engine Underneath

    Vector Graph Engine (VGE)

    The Vector Graph Engine is Endee's unified search runtime, the layer that combines Layered Memory Architecture, Int8e Quantization, and Progressive Filtering into a single high-performance ANN pipeline. Rather than treating these as independent optimizations, VGE co-designs them so each reinforces the others: quantized vectors fit hotter graph layers into RAM, progressive filtering exploits the graph structure to enforce constraints early, and the memory hierarchy keeps hot nodes in cache for both techniques.

    90%

    Cost savings

    vs. traditional vector databases

    1/10th

    Infrastructure

    vs. equivalent memory footprint

    99%+

    Recall

    at one billion vectors

    Higher Recall

    Best-in-class accuracy at any scale

    Higher QPS

    More queries per second per node

    Lower Latency

    Sub-5ms p99 under load

    Core IP #1

    Layered Memory Architecture

    Hybrid storage for optimal cost-performance at scale

    A conventional vector database loads the entire HNSW graph into RAM, which works at small scale but becomes prohibitively expensive at billions of vectors. Endee's Layered Memory Architecture separates the vector graph into two tiers.

    Base-layer graph nodes are persisted on disk using memory-mapped files. The OS page cache acts as a warm tier, frequently accessed nodes stay resident in memory naturally, without explicit management. The upper layers of the HNSW graph, the small-world navigational links that determine routing quality, are held in a configurable in-memory cache.

    The result: RAM consumption drops by orders of magnitude compared to a fully in-memory index, while recall remains stable because the high-connectivity upper layers stay hot. Latency stays low because base-layer access via mmap is faster than a network round-trip to a distributed cache.

    Key outcomes

    • Up to 10x less RAM vs. fully in-memory HNSW
    • Stable recall regardless of dataset size
    • Consistent latency under mixed cold/warm access patterns
    • Linear horizontal scaling with predictable memory cost

    Core IP #2

    Int8e Quantization

    4x compression with minimal recall degradation

    Float32 vectors are 4 bytes per dimension. At 768 dimensions and 100M vectors, that's 300GB of raw vector storage, before indexing overhead. Standard INT8 quantization halves or quarters this, but introduces precision loss that degrades recall.

    Endee's Int8e (Int8 enhanced) format is a proprietary encoding that reduces vector footprint by 4x compared to float32 while preserving retrieval accuracy through three mechanisms: per-vector dynamic scale calibration, selective precision retention for dimensions with high variance, and a correction step applied during distance computation.

    Int8e is an enterprise-only format. Open-source builds support standard INT8 and INT16 with dynamic per-vector scaling, which already deliver significant compression at high accuracy.

    Key outcomes

    • 4x storage reduction vs. float32
    • Minimal recall degradation vs. standard INT8
    • Enables billion-scale indexes on commodity hardware
    • Dynamic calibration, no manual per-dataset tuning

    Core IP #3

    Progressive Filtering

    Multi-stage retrieval that preserves graph connectivity

    Applying a metadata filter to an HNSW search creates a fundamental tension: the graph was built on the full dataset, but the query must return only the filtered subset. Naive approaches prune edges aggressively to enforce filters early, which disconnects the graph and degrades recall dramatically, particularly for selective filters.

    Endee's Progressive Filtering applies constraints incrementally across the graph traversal. Rather than blocking graph edges at the outset, the algorithm progressively tightens the candidate set while preserving enough connectivity to navigate toward globally optimal results. A final check enforces hard filter constraints on the output set.

    The result is recall that degrades gracefully with filter selectivity, rather than collapsing. Highly selective filters (0.1% matching) still return accurate top-k results that single-pass approaches miss.

    Key outcomes

    • Graceful recall degradation vs. cliff-edge collapse under selective filters
    • Lower compute cost than post-filtering on the full ANN result
    • Works on any metadata type: categorical, numeric range, geo, custom
    • No re-indexing required when filter patterns change

    Core IP #4

    Queryable Encryption

    Similarity search on encrypted vectors, zero decryption

    Standard vector databases require decrypting your data to run search. The server receives plaintext vectors, builds the graph on plaintext, and executes queries against plaintext. For regulated industries, finance, healthcare, defense, this is architecturally incompatible with data sovereignty requirements.

    Endee's Queryable Encryption enables ANN similarity search directly on encrypted vector representations. Encryption and decryption happen exclusively on the client side. The Endee server stores and processes only ciphertext throughout the entire data lifecycle: ingest, indexing, and query execution.

    The encryption scheme is designed to preserve approximate distance ordering in the encrypted space, enabling graph traversal without decryption. The server learns nothing about the content of the vectors it processes.

    Key outcomes

    • Zero plaintext exposure on the server, at rest, in transit, and during search
    • Client-side key management. Endee never holds decryption keys
    • Satisfies strict HIPAA, GDPR, and financial data sovereignty requirements
    • Practical query performance, encryption overhead is sub-5ms

    See the technology in production

    Independent benchmarks on the Cohere 10M dataset show Endee achieving the lowest cost-per-billion-queries of any tested vector database.