Resources
Core Technology
Four proprietary innovations, each a purpose-built answer to a real production limitation in existing vector databases.
The Vector Graph Engine
The Engine Behind the Numbers
Multi-Layer Memory Architecture
Hot/warm/cold tiers keep frequently-accessed vectors in RAM and the rest on fast NVMe. Maximises throughput without over-provisioning memory.
→ Enables 1B+ vectors on a single node
Efficient Quantization
Scalar quantization (FP32 to INT8) compresses each vector 4x with near-zero recall loss. Endee quantizes at index-build time so queries always hit compressed vectors.
→ 1/10th the RAM, same 99%+ recall
Endee
powered by
Vector
Graph
Engine
Proprietary
architecture
SIMD Acceleration
CPU vector instructions (AVX-512, NEON) compute 16+ dot-products per clock cycle. Distance calculations run at hardware speed, no GPU, no special infrastructure needed.
→ <5ms P99 latency at 10K+ QPS
Efficient Graph Construction
Endee builds the HNSW graph incrementally with pruning and degree-limiting heuristics. Index builds 2-4x faster than naive HNSW, enabling real-time catalog updates.
→ Live catalog sync without index rebuilds
Four compounding innovations. Each one eliminates a bottleneck. Together, they deliver an unfair advantage.
The Engine Underneath
Vector Graph Engine (VGE)
The Vector Graph Engine is Endee's unified search runtime, the layer that combines Layered Memory Architecture, Int8e Quantization, and Progressive Filtering into a single high-performance ANN pipeline. Rather than treating these as independent optimizations, VGE co-designs them so each reinforces the others: quantized vectors fit hotter graph layers into RAM, progressive filtering exploits the graph structure to enforce constraints early, and the memory hierarchy keeps hot nodes in cache for both techniques.
90%
Cost savings
vs. traditional vector databases
1/10th
Infrastructure
vs. equivalent memory footprint
99%+
Recall
at one billion vectors
Higher Recall
Best-in-class accuracy at any scale
Higher QPS
More queries per second per node
Lower Latency
Sub-5ms p99 under load
Core IP #1
Layered Memory Architecture
Hybrid storage for optimal cost-performance at scale
A conventional vector database loads the entire HNSW graph into RAM, which works at small scale but becomes prohibitively expensive at billions of vectors. Endee's Layered Memory Architecture separates the vector graph into two tiers.
Base-layer graph nodes are persisted on disk using memory-mapped files. The OS page cache acts as a warm tier, frequently accessed nodes stay resident in memory naturally, without explicit management. The upper layers of the HNSW graph, the small-world navigational links that determine routing quality, are held in a configurable in-memory cache.
The result: RAM consumption drops by orders of magnitude compared to a fully in-memory index, while recall remains stable because the high-connectivity upper layers stay hot. Latency stays low because base-layer access via mmap is faster than a network round-trip to a distributed cache.
Key outcomes
- Up to 10x less RAM vs. fully in-memory HNSW
- Stable recall regardless of dataset size
- Consistent latency under mixed cold/warm access patterns
- Linear horizontal scaling with predictable memory cost
Core IP #2
Int8e Quantization
4x compression with minimal recall degradation
Float32 vectors are 4 bytes per dimension. At 768 dimensions and 100M vectors, that's 300GB of raw vector storage, before indexing overhead. Standard INT8 quantization halves or quarters this, but introduces precision loss that degrades recall.
Endee's Int8e (Int8 enhanced) format is a proprietary encoding that reduces vector footprint by 4x compared to float32 while preserving retrieval accuracy through three mechanisms: per-vector dynamic scale calibration, selective precision retention for dimensions with high variance, and a correction step applied during distance computation.
Int8e is an enterprise-only format. Open-source builds support standard INT8 and INT16 with dynamic per-vector scaling, which already deliver significant compression at high accuracy.
Key outcomes
- 4x storage reduction vs. float32
- Minimal recall degradation vs. standard INT8
- Enables billion-scale indexes on commodity hardware
- Dynamic calibration, no manual per-dataset tuning
Core IP #3
Progressive Filtering
Multi-stage retrieval that preserves graph connectivity
Applying a metadata filter to an HNSW search creates a fundamental tension: the graph was built on the full dataset, but the query must return only the filtered subset. Naive approaches prune edges aggressively to enforce filters early, which disconnects the graph and degrades recall dramatically, particularly for selective filters.
Endee's Progressive Filtering applies constraints incrementally across the graph traversal. Rather than blocking graph edges at the outset, the algorithm progressively tightens the candidate set while preserving enough connectivity to navigate toward globally optimal results. A final check enforces hard filter constraints on the output set.
The result is recall that degrades gracefully with filter selectivity, rather than collapsing. Highly selective filters (0.1% matching) still return accurate top-k results that single-pass approaches miss.
Key outcomes
- Graceful recall degradation vs. cliff-edge collapse under selective filters
- Lower compute cost than post-filtering on the full ANN result
- Works on any metadata type: categorical, numeric range, geo, custom
- No re-indexing required when filter patterns change
Core IP #4
Queryable Encryption
Similarity search on encrypted vectors, zero decryption
Standard vector databases require decrypting your data to run search. The server receives plaintext vectors, builds the graph on plaintext, and executes queries against plaintext. For regulated industries, finance, healthcare, defense, this is architecturally incompatible with data sovereignty requirements.
Endee's Queryable Encryption enables ANN similarity search directly on encrypted vector representations. Encryption and decryption happen exclusively on the client side. The Endee server stores and processes only ciphertext throughout the entire data lifecycle: ingest, indexing, and query execution.
The encryption scheme is designed to preserve approximate distance ordering in the encrypted space, enabling graph traversal without decryption. The server learns nothing about the content of the vectors it processes.
Key outcomes
- Zero plaintext exposure on the server, at rest, in transit, and during search
- Client-side key management. Endee never holds decryption keys
- Satisfies strict HIPAA, GDPR, and financial data sovereignty requirements
- Practical query performance, encryption overhead is sub-5ms
See the technology in production
Independent benchmarks on the Cohere 10M dataset show Endee achieving the lowest cost-per-billion-queries of any tested vector database.