What is product quantization?

Compressing a vector into a short code

Product Quantization (PQ) is a technique for compressing vectors so they take up far less memory. Here is the intuition: instead of storing the full, precise description of a vector, you break it into smaller pieces and describe each piece approximately using a reference from a pre-built dictionary, called a codebook.

For example, imagine describing a color precisely as "red: 214, green: 78, blue: 42." Product Quantization would instead store a short code like "warm-red-3" that refers to a pre-agreed color in a codebook. The exact shade is lost, but you still capture enough information to tell warm reds from cool blues. A vector that originally requires thousands of bytes of storage might be compressed to just 8 or 16 bytes, reducing memory usage by 100x or more.

Searching on compressed data

The clever part of product quantization is that you can measure similarity directly on the compressed codes without fully decompressing the vectors first. Before running a search, the system precomputes a small lookup table of distances between the query and each entry in the codebook. Then, scoring a compressed vector requires only a few quick table lookups and additions, rather than comparing all the original numbers.

This is why IVF-PQ (the combination of IVF clustering and product quantization) can scan through billions of compressed vectors per second on a single server. The data volume shrinks dramatically, and distance calculations on compressed codes are fast.

The accuracy trade-off

Compression comes at a cost. Because PQ replaces exact values with codebook approximations, some accuracy is lost in the similarity ranking. How much depends on how aggressively you compress. With moderate compression settings, product quantization typically achieves 90 to 95% accuracy (measured as recall@10) compared to searching the uncompressed vectors.

For most applications at very large scale, this trade-off is worthwhile: a small accuracy reduction is acceptable when the alternative is needing hardware you cannot afford. For smaller datasets where memory is not a bottleneck, scalar quantization (a simpler compression technique) is usually preferable because it causes far less accuracy loss at a 4x compression ratio.

Compressing a vector into a short code

Searching on compressed data

The accuracy trade-off

Related concepts

Scalar Quantization

IVF

Index Types

ANN

Recall & Precision

Put Product Quantization to work with Endee