The highway and side street idea
HNSW stands for Hierarchical Navigable Small World. The name is technical, but the idea is intuitive. Imagine navigating a city. You start on a highway that covers large distances quickly, getting you to the right part of town. Then you switch to smaller roads to find the exact street, then the exact house.
HNSW works the same way with data. It builds a multi-layer map of all the vectors in the database. The top layer has a few well-connected "express" paths that let you jump to the right neighborhood of the data quickly. Lower layers have many more connections that narrow the search to the closest results. To find the nearest neighbors of a new query, the system starts at the top layer, follows the fastest paths to the general vicinity, then refines the search in lower layers until it has the best matches.
The settings that control speed and accuracy
HNSW has two important settings. The first, called ef_construction, controls how carefully the map is built: a higher value means more connections are evaluated when adding each item to the index, producing a more accurate map at the cost of longer build time. The second, called ef_search, controls how hard the algorithm works during each search: a higher value checks more candidates and returns more accurate results, but takes slightly longer.
In practice, these settings are tuned to hit a target like "find the correct results at least 95% of the time within 10 milliseconds per query." Once tuned, HNSW consistently delivers fast, high-quality results without any further manual intervention.
When HNSW works well and when it does not
HNSW is the go-to choice for most production deployments because it offers the best combination of speed and accuracy among in-memory index types. A typical HNSW index finds the nearest 10 results from a collection of 10 million items in under 5 milliseconds, with over 99% accuracy.
Its main limitation is memory. Every item in an HNSW index requires extra memory for the graph connections, on top of the space needed for the vector itself. This is fine for tens of millions of items but can become expensive for collections of hundreds of millions or billions. In those cases, compression techniques like scalar quantization can reduce memory usage significantly, or a disk-based alternative called DiskANN can be used instead.