Independently Verified · VectorDBBench

Endee outperforms every
vector database on the metrics that matter

Higher recall. More queries per second. Lower latency. A fraction of the cost.

Optimize Your Results

Benchmarking Tips

Endee uses a layered memory architecture designed for massive scale.

It can support 100M+ vectors on a single server with just 128GB of RAM.

Follow these best practices to get the most accurate and optimal benchmark results from Endee:

Run benchmarks multiple times to allow hot paths to be cached in the vector cache.

Set VECTOR_CACHE_PERCENTAGE to 100 for smaller datasets to ensure all vectors reside in memory.

Use int16 for benchmarking. It leverages Endee's adaptive quantization to reduce memory usage by ~50% with no measurable impact on recall.

Queries Per Second & Cost per Billion Queries

Endee delivers the highest throughput at the lowest cost, making it the most economical choice for production AI workloads.

Endee is a very small single node configuration as compared to infra-heavy competitors, yet Endee still outperforms all of them.

Verified by VectorDBBenchCohere 10M dataset768 dimensionsPinecone · Milvus · Qdrant · Zilliz Cloud · Vespa

Queries Per Second (Higher is Better)

Cost per Billion Queries (Lower is Better)

Recall & Latency Analysis

Endee maintains high recall with low latency, providing the optimal balance for production AI systems.

Endee is a very small single node configuration as compared to infra-heavy competitors, yet Endee still outperforms all of them.

Verified by VectorDBBenchCohere 10M dataset768 dimensionsPinecone · Milvus · Qdrant · Zilliz Cloud · Vespa

Recall Score % (Higher is Better)

Latency in ms (Lower is Better)

Head-to-Head

Endee outperforms every
vector database we tested

Reproducible, head-to-head benchmarks against each vendor. Same dataset, same client, no cherry-picking. Higher recall, higher QPS, lower latency, lower cost.

Higher

Recall

Higher

QPS

Lower

Latency

Lower

Cost

Compare Endee against tap to switch

For the detailed report and full benchmarking methodology, read the Endee vs Vertex AI blog post.

Setup

Test environment

Endee runs on a 4× smaller server than Vertex AI, on identical client hardware and dataset.

DatasetCohere · 1M vectors · 768D

Client16 vCPU · 64 GB · us-central1-a

Vertex AI

Server: n1-standard-16 · 16 vCPU · 60 GB

Index: approx_neighbors=128

Endee OSS

Server: 4 vCPU · 16 GB · us-central1-a

Index: m=32 · ef_con=256 · Precision=int16

Accuracy

Recall vs TopK

At ~800 QPS, concurrency 8. Tuning leaf_node_search_percent (Vertex AI) and ef_search (Endee).

Vertex AI

Tuning leaf_nodes_to_search

leaf_nodes	TopK	Recall
0.05	3	0.8997
0.05	5	0.8932
0.05	10	0.8893
0.04	15	0.8580
0.025	30	0.7776

Endee

Tuning ef_search

ef_search	TopK	Recall
100	3	0.9923
100	5	0.9934
95	10	0.9918
95	15	0.9911
100	30	0.9867

Recall vs TopK

Higher is better · Endee leads at every TopK

Throughput & latency

QPS and p99 latency vs concurrency

Recall held constant at 97.31% (Vertex) / 97.32% (Endee) · topK=30 · Client: 16 vCPU / 64 GB (us-central1-a).

Vertex AI

Recall held constant at 97.31%

Concurrency	QPS	p99 (ms)
2	140.8	59.2
4	279.7	68.7
8	545.0	62.5
16	1,079.5	25.3

Endee

Recall held constant at 97.32%

Concurrency	QPS	p99 (ms)
2	661.1	3.7
4	1,295.0	3.7
8	1,881.2	3.8
16	2,091.5	3.7

QPS vs Concurrency

Higher is better

p99 Latency vs Concurrency (ms)

Lower is better

~17×

lower p99 latency at concurrency 8

4.7×

higher QPS at concurrency 2 (~800 QPS target)

4× smaller

server footprint vs Vertex AI

A small single-node that outperforms them all

Endee runs on a minimal single-node configuration yet consistently outperforms infra-heavy, multi-node competitors in throughput, recall, latency, and cost.

Endee outperforms every vector database on the metrics that matter

Benchmarking Tips

Queries Per Second & Cost per Billion Queries

Queries Per Second (Higher is Better)

Cost per Billion Queries (Lower is Better)

Recall & Latency Analysis

Recall Score % (Higher is Better)

Latency in ms (Lower is Better)

Endee outperforms every vector database we tested

Test environment

Recall vs TopK

Vertex AI

Endee

Recall vs TopK

QPS and p99 latency vs concurrency

Vertex AI

Endee

QPS vs Concurrency

p99 Latency vs Concurrency (ms)

A small single-node that outperforms them all

Endee outperforms every
vector database on the metrics that matter

Endee outperforms every
vector database we tested