Incremental IVF Index Maintenance for Streaming Vector Search
The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existin...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The prevalence of vector similarity search in modern machine learning
applications and the continuously changing nature of data processed by these
applications necessitate efficient and effective index maintenance techniques
for vector search indexes. Designed primarily for static workloads, existing
vector search indexes degrade in search quality and performance as the
underlying data is updated unless costly index reconstruction is performed. To
address this, we introduce Ada-IVF, an incremental indexing methodology for
Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance
policy that decides which index partitions are problematic for performance and
should be repartitioned and 2) a local re-clustering mechanism that determines
how to repartition them. Compared with state-of-the-art dynamic IVF index
maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher
update throughput across a range of benchmark workloads. |
---|---|
AbstractList | The prevalence of vector similarity search in modern machine learning
applications and the continuously changing nature of data processed by these
applications necessitate efficient and effective index maintenance techniques
for vector search indexes. Designed primarily for static workloads, existing
vector search indexes degrade in search quality and performance as the
underlying data is updated unless costly index reconstruction is performed. To
address this, we introduce Ada-IVF, an incremental indexing methodology for
Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance
policy that decides which index partitions are problematic for performance and
should be repartitioned and 2) a local re-clustering mechanism that determines
how to repartition them. Compared with state-of-the-art dynamic IVF index
maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher
update throughput across a range of benchmark workloads. |
Author | Pound, Jeffery Pacaci, Anil Venkataraman, Shivaram Minhas, Umar Farooq Chowdhury, Shihabur Rahman Renggli, Cedric Rekatsinas, Theodoros Mohoney, Jason Reyhani, Nima Ilyas, Ihab F |
Author_xml | – sequence: 1 givenname: Jason surname: Mohoney fullname: Mohoney, Jason – sequence: 2 givenname: Anil surname: Pacaci fullname: Pacaci, Anil – sequence: 3 givenname: Shihabur Rahman surname: Chowdhury fullname: Chowdhury, Shihabur Rahman – sequence: 4 givenname: Umar Farooq surname: Minhas fullname: Minhas, Umar Farooq – sequence: 5 givenname: Jeffery surname: Pound fullname: Pound, Jeffery – sequence: 6 givenname: Cedric surname: Renggli fullname: Renggli, Cedric – sequence: 7 givenname: Nima surname: Reyhani fullname: Reyhani, Nima – sequence: 8 givenname: Ihab F surname: Ilyas fullname: Ilyas, Ihab F – sequence: 9 givenname: Theodoros surname: Rekatsinas fullname: Rekatsinas, Theodoros – sequence: 10 givenname: Shivaram surname: Venkataraman fullname: Venkataraman, Shivaram |
BackLink | https://doi.org/10.48550/arXiv.2411.00970$$DView paper in arXiv |
BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxNNQzMLA0N-BksPXMSy5KzU3NK0nMUfAMc1PwzEtJrVDwTczMK0nNS8xLTlVIyy9SCC4pSk3MzcxLVwhLTS4BCaQmFiVn8DCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5iUWV8SAL48EWGhNWAQAzSjbD |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2411.00970 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2411_00970 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2411_009703 |
IEDL.DBID | GOX |
IngestDate | Wed Nov 06 12:21:13 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2411_009703 |
OpenAccessLink | https://arxiv.org/abs/2411.00970 |
ParticipantIDs | arxiv_primary_2411_00970 |
PublicationCentury | 2000 |
PublicationDate | 2024-11-01 |
PublicationDateYYYYMMDD | 2024-11-01 |
PublicationDate_xml | – month: 11 year: 2024 text: 2024-11-01 day: 01 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 3.8773367 |
SecondaryResourceType | preprint |
Snippet | The prevalence of vector similarity search in modern machine learning
applications and the continuously changing nature of data processed by these
applications... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Databases Computer Science - Learning |
Title | Incremental IVF Index Maintenance for Streaming Vector Search |
URI | https://arxiv.org/abs/2411.00970 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSTNNSzW1AHVTjQxMdE3MEhN1LYDtBl3DJGNTA4MUE-MU8GHVvn5mHqEmXhGmEUwMCrC9MIlFFZllkPOBk4r1gdWLoR5oqwGwU85sZARasuXuHwGZnAQfxQVVj1AHbGOChZAqCTdBBn5o607BERIdQgxMqXkiDLbAPAgZhQNKeYa5KXiCTihU8E0EHdUAOu8iVQHYclQAzQ8n5gJrEoUw8Ei6AmQlsCiDvJtriLOHLtjC-ALI6RDxILfEg91iLMbAAuzDp0owKAArUXPDtMSktBRzYA_GGJhr0oBZxzLF1NIsySgxLVGSQQKXKVK4paQZuIyAdSxka5wMA0tJUWmqLLCOLEmSAwcUAE9xaZQ |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Incremental+IVF+Index+Maintenance+for+Streaming+Vector+Search&rft.au=Mohoney%2C+Jason&rft.au=Pacaci%2C+Anil&rft.au=Chowdhury%2C+Shihabur+Rahman&rft.au=Minhas%2C+Umar+Farooq&rft.date=2024-11-01&rft_id=info:doi/10.48550%2Farxiv.2411.00970&rft.externalDocID=2411_00970 |