Incremental IVF Index Maintenance for Streaming Vector Search

The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existin...

Full description

Saved in:
Bibliographic Details
Main Authors Mohoney, Jason, Pacaci, Anil, Chowdhury, Shihabur Rahman, Minhas, Umar Farooq, Pound, Jeffery, Renggli, Cedric, Reyhani, Nima, Ilyas, Ihab F, Rekatsinas, Theodoros, Venkataraman, Shivaram
Format Journal Article
LanguageEnglish
Published 01.11.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existing vector search indexes degrade in search quality and performance as the underlying data is updated unless costly index reconstruction is performed. To address this, we introduce Ada-IVF, an incremental indexing methodology for Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance policy that decides which index partitions are problematic for performance and should be repartitioned and 2) a local re-clustering mechanism that determines how to repartition them. Compared with state-of-the-art dynamic IVF index maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher update throughput across a range of benchmark workloads.
AbstractList The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existing vector search indexes degrade in search quality and performance as the underlying data is updated unless costly index reconstruction is performed. To address this, we introduce Ada-IVF, an incremental indexing methodology for Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance policy that decides which index partitions are problematic for performance and should be repartitioned and 2) a local re-clustering mechanism that determines how to repartition them. Compared with state-of-the-art dynamic IVF index maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher update throughput across a range of benchmark workloads.
Author Pound, Jeffery
Pacaci, Anil
Venkataraman, Shivaram
Minhas, Umar Farooq
Chowdhury, Shihabur Rahman
Renggli, Cedric
Rekatsinas, Theodoros
Mohoney, Jason
Reyhani, Nima
Ilyas, Ihab F
Author_xml – sequence: 1
  givenname: Jason
  surname: Mohoney
  fullname: Mohoney, Jason
– sequence: 2
  givenname: Anil
  surname: Pacaci
  fullname: Pacaci, Anil
– sequence: 3
  givenname: Shihabur Rahman
  surname: Chowdhury
  fullname: Chowdhury, Shihabur Rahman
– sequence: 4
  givenname: Umar Farooq
  surname: Minhas
  fullname: Minhas, Umar Farooq
– sequence: 5
  givenname: Jeffery
  surname: Pound
  fullname: Pound, Jeffery
– sequence: 6
  givenname: Cedric
  surname: Renggli
  fullname: Renggli, Cedric
– sequence: 7
  givenname: Nima
  surname: Reyhani
  fullname: Reyhani, Nima
– sequence: 8
  givenname: Ihab F
  surname: Ilyas
  fullname: Ilyas, Ihab F
– sequence: 9
  givenname: Theodoros
  surname: Rekatsinas
  fullname: Rekatsinas, Theodoros
– sequence: 10
  givenname: Shivaram
  surname: Venkataraman
  fullname: Venkataraman, Shivaram
BackLink https://doi.org/10.48550/arXiv.2411.00970$$DView paper in arXiv
BookMark eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxNNQzMLA0N-BksPXMSy5KzU3NK0nMUfAMc1PwzEtJrVDwTczMK0nNS8xLTlVIyy9SCC4pSk3MzcxLVwhLTS4BCaQmFiVn8DCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5iUWV8SAL48EWGhNWAQAzSjbD
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2411.00970
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2411_00970
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2411_009703
IEDL.DBID GOX
IngestDate Wed Nov 06 12:21:13 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2411_009703
OpenAccessLink https://arxiv.org/abs/2411.00970
ParticipantIDs arxiv_primary_2411_00970
PublicationCentury 2000
PublicationDate 2024-11-01
PublicationDateYYYYMMDD 2024-11-01
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-11-01
  day: 01
PublicationDecade 2020
PublicationYear 2024
Score 3.8773367
SecondaryResourceType preprint
Snippet The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Databases
Computer Science - Learning
Title Incremental IVF Index Maintenance for Streaming Vector Search
URI https://arxiv.org/abs/2411.00970
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSTNNSzW1AHVTjQxMdE3MEhN1LYDtBl3DJGNTA4MUE-MU8GHVvn5mHqEmXhGmEUwMCrC9MIlFFZllkPOBk4r1gdWLoR5oqwGwU85sZARasuXuHwGZnAQfxQVVj1AHbGOChZAqCTdBBn5o607BERIdQgxMqXkiDLbAPAgZhQNKeYa5KXiCTihU8E0EHdUAOu8iVQHYclQAzQ8n5gJrEoUw8Ei6AmQlsCiDvJtriLOHLtjC-ALI6RDxILfEg91iLMbAAuzDp0owKAArUXPDtMSktBRzYA_GGJhr0oBZxzLF1NIsySgxLVGSQQKXKVK4paQZuIyAdSxka5wMA0tJUWmqLLCOLEmSAwcUAE9xaZQ
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Incremental+IVF+Index+Maintenance+for+Streaming+Vector+Search&rft.au=Mohoney%2C+Jason&rft.au=Pacaci%2C+Anil&rft.au=Chowdhury%2C+Shihabur+Rahman&rft.au=Minhas%2C+Umar+Farooq&rft.date=2024-11-01&rft_id=info:doi/10.48550%2Farxiv.2411.00970&rft.externalDocID=2411_00970