Grouping data into neighborhoods
IVF stands for Inverted File Index. The idea is similar to how a city is organized into neighborhoods. Instead of searching every house in the city to find a match, you first identify the right neighborhood, then only search within it.
IVF divides all the vectors in the database into groups called clusters, each represented by a central point called a centroid. Every vector is assigned to the cluster whose centroid it is closest to. At search time, the system first compares the query to all the cluster centroids. It identifies the most promising clusters and searches only within those, skipping the rest. This dramatically reduces the number of comparisons needed without losing many relevant results.
The nprobe setting: how many neighborhoods to check
The key setting in an IVF index is nprobe, which controls how many clusters are searched for each query. If nprobe is set to 1, only the single most likely cluster is searched. This is very fast but risks missing results that ended up in a nearby cluster. If nprobe is set to 32 or 64, more clusters are searched, which finds more correct results but takes a little longer.
In production, the right nprobe value is found by testing: start low for speed, increase until the accuracy target (such as "find the correct items 97% of the time") is reached. The ideal value balances how fast results need to arrive with how accurate they need to be.
IVF-PQ: handling billions of items
IVF becomes especially powerful when combined with a compression technique called Product Quantization (often written as IVF-PQ). Product Quantization compresses each vector into a much smaller representation, allowing billions of vectors to fit in the memory that would otherwise hold only millions.
IVF-PQ is the standard approach for datasets with hundreds of millions to billions of items. It is used in systems like FAISS (an open-source library from Meta) and Google's ScaNN. The trade-off is a small reduction in accuracy compared to uncompressed search, but this is acceptable for most applications.