Endee outperforms every
vector database on the metrics that matter
Higher recall. More queries per second. Lower latency. A fraction of the cost.
Benchmarking Tips
Endee uses a layered memory architecture designed for massive scale.
It can support 100M+ vectors on a single server with just 128GB of RAM.
Follow these best practices to get the most accurate and optimal benchmark results from Endee:
Run benchmarks multiple times to allow hot paths to be cached in the vector cache.
Set VECTOR_CACHE_PERCENTAGE to 100 for smaller datasets to ensure all vectors reside in memory.
Use int16 for benchmarking. It leverages Endee's adaptive quantization to reduce memory usage by ~50% with no measurable impact on recall.
Queries Per Second & Cost per Billion Queries
Endee delivers the highest throughput at the lowest cost, making it the most economical choice for production AI workloads.
Endee is a very small single node configuration as compared to infra-heavy competitors, yet Endee still outperforms all of them.
Queries Per Second (Higher is Better)
Cost per Billion Queries (Lower is Better)
Recall & Latency Analysis
Endee maintains high recall with low latency, providing the optimal balance for production AI systems.
Endee is a very small single node configuration as compared to infra-heavy competitors, yet Endee still outperforms all of them.
Recall Score % (Higher is Better)
Latency in ms (Lower is Better)
Endee outperforms every
vector database we tested
Reproducible, head-to-head benchmarks against each vendor. Same dataset, same client, no cherry-picking. Higher recall, higher QPS, lower latency, lower cost.
Higher
Recall
Higher
QPS
Lower
Latency
Lower
Cost
For the detailed report and full benchmarking methodology, read the Endee vs Vertex AI blog post.
Test environment
Endee runs on a 4× smaller server than Vertex AI, on identical client hardware and dataset.
Vertex AI
Server: n1-standard-16 · 16 vCPU · 60 GB
Index: approx_neighbors=128
Endee OSS
Server: 4 vCPU · 16 GB · us-central1-a
Index: m=32 · ef_con=256 · Precision=int16
Recall vs TopK
At ~800 QPS, concurrency 8. Tuning leaf_node_search_percent (Vertex AI) and ef_search (Endee).
Vertex AI
Tuning leaf_nodes_to_search
| leaf_nodes | TopK | Recall |
|---|---|---|
| 0.05 | 3 | 0.8997 |
| 0.05 | 5 | 0.8932 |
| 0.05 | 10 | 0.8893 |
| 0.04 | 15 | 0.8580 |
| 0.025 | 30 | 0.7776 |
Endee
Tuning ef_search
| ef_search | TopK | Recall |
|---|---|---|
| 100 | 3 | 0.9923 |
| 100 | 5 | 0.9934 |
| 95 | 10 | 0.9918 |
| 95 | 15 | 0.9911 |
| 100 | 30 | 0.9867 |
Recall vs TopK
Higher is better · Endee leads at every TopK
QPS and p99 latency vs concurrency
Recall held constant at 97.31% (Vertex) / 97.32% (Endee) · topK=30 · Client: 16 vCPU / 64 GB (us-central1-a).
Vertex AI
Recall held constant at 97.31%
| Concurrency | QPS | p99 (ms) |
|---|---|---|
| 2 | 140.8 | 59.2 |
| 4 | 279.7 | 68.7 |
| 8 | 545.0 | 62.5 |
| 16 | 1,079.5 | 25.3 |
Endee
Recall held constant at 97.32%
| Concurrency | QPS | p99 (ms) |
|---|---|---|
| 2 | 661.1 | 3.7 |
| 4 | 1,295.0 | 3.7 |
| 8 | 1,881.2 | 3.8 |
| 16 | 2,091.5 | 3.7 |
QPS vs Concurrency
Higher is better
p99 Latency vs Concurrency (ms)
Lower is better
~17×
lower p99 latency at concurrency 8
4.7×
higher QPS at concurrency 2 (~800 QPS target)
4× smaller
server footprint vs Vertex AI