Resources

Core Technology

Four proprietary innovations, each a purpose-built answer to a real production limitation in existing vector databases.

ProprietaryProduction-provenBenchmarked independently

The Vector Graph Engine

The Engine Behind the Numbers

Multi-Layer Memory Architecture

Hot/warm/cold tiers keep frequently-accessed vectors in RAM and the rest on fast NVMe. Maximises throughput without over-provisioning memory.

→ Enables 1B+ vectors on a single node

Efficient Quantization

Scalar quantization (FP32 to INT8) compresses each vector 4x with near-zero recall loss. Endee quantizes at index-build time so queries always hit compressed vectors.

→ 1/10th the RAM, same 99%+ recall

Endee

Vector
Graph
Engine

Proprietary
architecture

SIMD Acceleration

CPU vector instructions (AVX-512, NEON) compute 16+ dot-products per clock cycle. Distance calculations run at hardware speed, no GPU, no special infrastructure needed.

→ <5ms P99 latency at 10K+ QPS

Efficient Graph Construction

Endee builds the HNSW graph incrementally with pruning and degree-limiting heuristics. Index builds 2-4x faster than naive HNSW, enabling real-time catalog updates.

→ Live catalog sync without index rebuilds

Four compounding innovations. Each one eliminates a bottleneck. Together, they deliver an unfair advantage.

The Engine Underneath

Vector Graph Engine (VGE)

The Vector Graph Engine is Endee's unified search runtime, the layer that combines Layered Memory Architecture, Int8e Quantization, and Progressive Filtering into a single high-performance ANN pipeline. Rather than treating these as independent optimizations, VGE co-designs them so each reinforces the others: quantized vectors fit hotter graph layers into RAM, progressive filtering exploits the graph structure to enforce constraints early, and the memory hierarchy keeps hot nodes in cache for both techniques.

90%

Cost savings

vs. traditional vector databases

1/10th

Infrastructure

vs. equivalent memory footprint

99%+

Recall

at one billion vectors

Higher Recall

Best-in-class accuracy at any scale

Higher QPS

More queries per second per node

Lower Latency

Sub-5ms p99 under load

Layered Memory Architecture

Int8e Quantization

Progressive Filtering

Queryable Encryption

Core IP #1

Layered Memory Architecture

Hybrid storage for optimal cost-performance at scale

A conventional vector database loads the entire HNSW graph into RAM, which works at small scale but becomes prohibitively expensive at billions of vectors. Endee's Layered Memory Architecture separates the vector graph into two tiers.

Base-layer graph nodes are persisted on disk using memory-mapped files. The OS page cache acts as a warm tier, frequently accessed nodes stay resident in memory naturally, without explicit management. The upper layers of the HNSW graph, the small-world navigational links that determine routing quality, are held in a configurable in-memory cache.

The result: RAM consumption drops by orders of magnitude compared to a fully in-memory index, while recall remains stable because the high-connectivity upper layers stay hot. Latency stays low because base-layer access via mmap is faster than a network round-trip to a distributed cache.

Key outcomes

Up to 10x less RAM vs. fully in-memory HNSW
Stable recall regardless of dataset size
Consistent latency under mixed cold/warm access patterns
Linear horizontal scaling with predictable memory cost

Core IP #2

Int8e Quantization

4x compression with minimal recall degradation

Float32 vectors are 4 bytes per dimension. At 768 dimensions and 100M vectors, that's 300GB of raw vector storage, before indexing overhead. Standard INT8 quantization halves or quarters this, but introduces precision loss that degrades recall.

Endee's Int8e (Int8 enhanced) format is a proprietary encoding that reduces vector footprint by 4x compared to float32 while preserving retrieval accuracy through three mechanisms: per-vector dynamic scale calibration, selective precision retention for dimensions with high variance, and a correction step applied during distance computation.

Int8e is an enterprise-only format. Open-source builds support standard INT8 and INT16 with dynamic per-vector scaling, which already deliver significant compression at high accuracy.

Key outcomes

4x storage reduction vs. float32
Minimal recall degradation vs. standard INT8
Enables billion-scale indexes on commodity hardware
Dynamic calibration, no manual per-dataset tuning

Core IP #3

Progressive Filtering

Multi-stage retrieval that preserves graph connectivity

Applying a metadata filter to an HNSW search creates a fundamental tension: the graph was built on the full dataset, but the query must return only the filtered subset. Naive approaches prune edges aggressively to enforce filters early, which disconnects the graph and degrades recall dramatically, particularly for selective filters.

Endee's Progressive Filtering applies constraints incrementally across the graph traversal. Rather than blocking graph edges at the outset, the algorithm progressively tightens the candidate set while preserving enough connectivity to navigate toward globally optimal results. A final check enforces hard filter constraints on the output set.

The result is recall that degrades gracefully with filter selectivity, rather than collapsing. Highly selective filters (0.1% matching) still return accurate top-k results that single-pass approaches miss.

Key outcomes

Graceful recall degradation vs. cliff-edge collapse under selective filters
Lower compute cost than post-filtering on the full ANN result
Works on any metadata type: categorical, numeric range, geo, custom
No re-indexing required when filter patterns change

Core IP #4

Queryable Encryption

Similarity search on encrypted vectors, zero decryption

Standard vector databases require decrypting your data to run search. The server receives plaintext vectors, builds the graph on plaintext, and executes queries against plaintext. For regulated industries, finance, healthcare, defense, this is architecturally incompatible with data sovereignty requirements.

Endee's Queryable Encryption enables ANN similarity search directly on encrypted vector representations. Encryption and decryption happen exclusively on the client side. The Endee server stores and processes only ciphertext throughout the entire data lifecycle: ingest, indexing, and query execution.

The encryption scheme is designed to preserve approximate distance ordering in the encrypted space, enabling graph traversal without decryption. The server learns nothing about the content of the vectors it processes.

Key outcomes

Zero plaintext exposure on the server, at rest, in transit, and during search
Client-side key management. Endee never holds decryption keys
Satisfies strict HIPAA, GDPR, and financial data sovereignty requirements
Practical query performance, encryption overhead is sub-5ms

See the technology in production

Independent benchmarks on the Cohere 10M dataset show Endee achieving the lowest cost-per-billion-queries of any tested vector database.

View benchmarks Security overview