Incremental IVF Index Maintenance for Streaming Vector Search
The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existin...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The prevalence of vector similarity search in modern machine learning
applications and the continuously changing nature of data processed by these
applications necessitate efficient and effective index maintenance techniques
for vector search indexes. Designed primarily for static workloads, existing
vector search indexes degrade in search quality and performance as the
underlying data is updated unless costly index reconstruction is performed. To
address this, we introduce Ada-IVF, an incremental indexing methodology for
Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance
policy that decides which index partitions are problematic for performance and
should be repartitioned and 2) a local re-clustering mechanism that determines
how to repartition them. Compared with state-of-the-art dynamic IVF index
maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher
update throughput across a range of benchmark workloads. |
---|---|
DOI: | 10.48550/arxiv.2411.00970 |