Storing numbers with less precision
Every vector is a list of decimal numbers. By default, each number is stored as a 32-bit float, which can represent very precise fractional values. Scalar quantization reduces this precision: Int8 quantization rounds each number to the nearest integer in the range -128 to 127. This is like rounding 3.14159 to just 3. The exact value is slightly off, but it is close enough for the purposes of similarity comparison.
The memory saving is significant. A 32-bit float takes 4 bytes; an Int8 integer takes 1 byte. So a vector of 768 numbers shrinks from 3,072 bytes to 768 bytes, exactly a 4x reduction. A database that would need 40 GB of memory can now fit in 10 GB.
Why it is the first choice for memory optimization
Scalar quantization is the most practical first step when you need to reduce memory usage because it causes very little accuracy loss. In most cases, the reduction in search quality is less than 0.5%, which is invisible to end users. Compare this to product quantization, which achieves far higher compression but at the cost of 5 to 10% accuracy loss.
Scalar quantization also requires no complex preparation. There is no codebook to train or tune; you simply convert the numbers to lower precision. Building and searching an Int8 index is straightforward and fast.
A speed benefit, not just a memory benefit
An unexpected advantage of Int8 quantization is that searching Int8 vectors can actually be faster than searching the original full-precision vectors. Modern processors have special hardware instructions designed to process many integers simultaneously. These instructions can handle 32 to 64 Int8 values in a single operation, compared to 8 to 16 float32 values. So searches on Int8 vectors run faster on the same hardware, in addition to using less memory.
Endee's Int8e format builds on standard Int8 quantization with additional calibration to further minimize accuracy loss, achieving near-identical results to full-precision search at a fraction of the memory cost.