Independently Verified · VectorDBBench

    Endee outperforms every vector database on the metrics that matter

    Higher recall. More queries per second. Lower latency. A fraction of the cost.

    Optimize Your Results

    Benchmarking Tips

    Endee uses a layered memory architecture designed for massive scale.

    It can support 100M+ vectors on a single server with just 128GB of RAM.

    Follow these best practices to get the most accurate and optimal benchmark results from Endee:

    1

    Run benchmarks multiple times to allow hot paths to be cached in the vector cache.

    2

    Set VECTOR_CACHE_PERCENTAGE to 100 for smaller datasets to ensure all vectors reside in memory.

    3

    Use int16 for benchmarking. It leverages Endee's adaptive quantization to reduce memory usage by ~50% with no measurable impact on recall.

    Queries Per Second & Cost per Billion Queries

    Endee delivers the highest throughput at the lowest cost, making it the most economical choice for production AI workloads.

    Endee is a very small single node configuration as compared to infra-heavy competitors, yet Endee still outperforms all of them.

    Verified by VectorDBBenchCohere 10M dataset768 dimensionsPinecone · Milvus · Qdrant · Zilliz Cloud · Vespa

    Queries Per Second (Higher is Better)

    Cost per Billion Queries (Lower is Better)

    Recall & Latency Analysis

    Endee maintains high recall with low latency, providing the optimal balance for production AI systems.

    Endee is a very small single node configuration as compared to infra-heavy competitors, yet Endee still outperforms all of them.

    Verified by VectorDBBenchCohere 10M dataset768 dimensionsPinecone · Milvus · Qdrant · Zilliz Cloud · Vespa

    Recall Score % (Higher is Better)

    Latency in ms (Lower is Better)

    Head-to-Head

    Endee outperforms every vector database we tested

    Reproducible, head-to-head benchmarks against each vendor. Same dataset, same client, no cherry-picking. Higher recall, higher QPS, lower latency, lower cost.

    Higher

    Recall

    Higher

    QPS

    Lower

    Latency

    Lower

    Cost

    Compare Endee against tap to switch

    For the detailed report and full benchmarking methodology, read the Endee vs Vertex AI blog post.

    Setup

    Test environment

    Endee runs on a 4× smaller server than Vertex AI, on identical client hardware and dataset.

    DatasetCohere · 1M vectors · 768D
    Client16 vCPU · 64 GB · us-central1-a

    Vertex AI

    Server: n1-standard-16 · 16 vCPU · 60 GB

    Index: approx_neighbors=128

    Endee OSS

    Server: 4 vCPU · 16 GB · us-central1-a

    Index: m=32 · ef_con=256 · Precision=int16

    Accuracy

    Recall vs TopK

    At ~800 QPS, concurrency 8. Tuning leaf_node_search_percent (Vertex AI) and ef_search (Endee).

    Vertex AI

    Tuning leaf_nodes_to_search

    leaf_nodesTopKRecall
    0.0530.8997
    0.0550.8932
    0.05100.8893
    0.04150.8580
    0.025300.7776

    Endee

    Tuning ef_search

    ef_searchTopKRecall
    10030.9923
    10050.9934
    95100.9918
    95150.9911
    100300.9867

    Recall vs TopK

    Higher is better · Endee leads at every TopK

    Throughput & latency

    QPS and p99 latency vs concurrency

    Recall held constant at 97.31% (Vertex) / 97.32% (Endee) · topK=30 · Client: 16 vCPU / 64 GB (us-central1-a).

    Vertex AI

    Recall held constant at 97.31%

    ConcurrencyQPSp99 (ms)
    2140.859.2
    4279.768.7
    8545.062.5
    161,079.525.3

    Endee

    Recall held constant at 97.32%

    ConcurrencyQPSp99 (ms)
    2661.13.7
    41,295.03.7
    81,881.23.8
    162,091.53.7

    QPS vs Concurrency

    Higher is better

    p99 Latency vs Concurrency (ms)

    Lower is better

    ~17×

    lower p99 latency at concurrency 8

    4.7×

    higher QPS at concurrency 2 (~800 QPS target)

    4× smaller

    server footprint vs Vertex AI

    A small single-node that outperforms them all

    Endee runs on a minimal single-node configuration yet consistently outperforms infra-heavy, multi-node competitors in throughput, recall, latency, and cost.