Quilt-1M: One Million Image-Text Pairs for Histopathology

Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning...

Full description

Saved in:
Bibliographic Details
Published inAdvances in neural information processing systems Vol. 36; no. DB1; p. 37995
Main Authors Ikezogwo, Wisdom O, Seyfioglu, Mehmet S, Ghezloo, Fatemeh, Geva, Dylan, Mohammed, Fatwir S, Anand, Pavan K, Krishna, Ranjay, Shapiro, Linda G
Format Journal Article
LanguageEnglish
Published United States 01.12.2023
Online AccessGet more information
ISSN1049-5258

Cover

Loading…
Abstract Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.
AbstractList Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.
Author Anand, Pavan K
Ikezogwo, Wisdom O
Mohammed, Fatwir S
Ghezloo, Fatemeh
Shapiro, Linda G
Krishna, Ranjay
Seyfioglu, Mehmet S
Geva, Dylan
Author_xml – sequence: 1
  givenname: Wisdom O
  surname: Ikezogwo
  fullname: Ikezogwo, Wisdom O
  organization: University of Washington
– sequence: 2
  givenname: Mehmet S
  surname: Seyfioglu
  fullname: Seyfioglu, Mehmet S
  organization: University of Washington
– sequence: 3
  givenname: Fatemeh
  surname: Ghezloo
  fullname: Ghezloo, Fatemeh
  organization: University of Washington
– sequence: 4
  givenname: Dylan
  surname: Geva
  fullname: Geva, Dylan
  organization: University of Washington
– sequence: 5
  givenname: Fatwir S
  surname: Mohammed
  fullname: Mohammed, Fatwir S
  organization: University of Washington
– sequence: 6
  givenname: Pavan K
  surname: Anand
  fullname: Anand, Pavan K
  organization: University of Washington
– sequence: 7
  givenname: Ranjay
  surname: Krishna
  fullname: Krishna, Ranjay
  organization: University of Washington
– sequence: 8
  givenname: Linda G
  surname: Shapiro
  fullname: Shapiro, Linda G
  organization: University of Washington
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38742142$$D View this record in MEDLINE/PubMed
BookMark eNo1j8FKAzEUAHOo2Fr7C5IfCCQvyebFmxS1hZYq1HNJNtka2N0suynYvxdRT3ObYe7IrM99nJGF4MoyDRrnZDVNyXPOQYIBdUvmEo0CoWBB7PsltYWJ_SM99JHuU9um3NNt586RHeNXoW8ujRNt8kg3aSp5cOUzt_l8vSc3jWunuPrjkny8PB_XG7Y7vG7XTzs2QGUKs7z6iTlRaWsBI6KrgvPCY5ASbdU0vNY2Bq9rqaFG1ajgDEisjeEBBSzJw693uPguhtMwps6N19P_A3wDp6xCuA
ContentType Journal Article
DBID NPM
DatabaseName PubMed
DatabaseTitle PubMed
DatabaseTitleList PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
ExternalDocumentID 38742142
Genre Journal Article
GroupedDBID -~X
ACNCT
AFFNX
AI.
F5P
NPM
P2P
VH1
ID FETCH-LOGICAL-p267t-9068742a1659928e88a6dab1b8d33896ff0c59edb5c352c84f4da7238c770d812
ISSN 1049-5258
IngestDate Thu Apr 03 07:08:06 EDT 2025
IsPeerReviewed false
IsScholarly false
Issue DB1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p267t-9068742a1659928e88a6dab1b8d33896ff0c59edb5c352c84f4da7238c770d812
PMID 38742142
ParticipantIDs pubmed_primary_38742142
PublicationCentury 2000
PublicationDate 20231201
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 20231201
  day: 1
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Advances in neural information processing systems
PublicationTitleAlternate Adv Neural Inf Process Syst
PublicationYear 2023
SSID ssib000232724
Score 2.2323813
Snippet Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of...
SourceID pubmed
SourceType Index Database
StartPage 37995
Title Quilt-1M: One Million Image-Text Pairs for Histopathology
URI https://www.ncbi.nlm.nih.gov/pubmed/38742142
Volume 36
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwGA1eQHwRxfuNPvgmkXVt09Y3b_MCKsJE3yRpUx2u69CqbL_ek6Rdq6ioL2U0ZHQ52bfzneU7HyFbXLCEN0RCXXB76jLhUo5fGeqFYO_SYYzrXofnF-zk2j279W6rQ-y6uiQXO9Hwy7qS_6CKe8BVVcn-AdnRm-IGXgNfXIEwrr_C-Oql082pfa7S-kuwRVXYp-A8TRElaBtxFwyx86QtF4whiGpAXAnppfusOQegT8Yqf0vtwzGqatzum1oCLTzU_M3VfnqUw-z-TautN53nOEu3L0eSjRwkney--6IlV_mQyrySWY8f5LCb6XktkN1UjkTpY_lq_oEadIt9W0gSTad2vMNEUaQdyHCNJ3sZZo3PSbGdDvftWtR0lCtdPZ5j2fqpBs0JkLvbxoDr59FPttnl0DgZRwKhOqLWZRzwSN80PC4fdZpMlVM-ZRiaabRnyUyRIlh7Bu85MiZ78yQssd61gLRVIG1VSFsaaQuwWR-RXiDXraP2wQkt2l7QfpP5OQ0bTD0It5kXhs1ABgFnMRe2CGIH9JIlSSPyQhkLLwJ7jgI3cWOuesdFvt-IQdgWyUQv68llYtmx5CDAPmYyN0aoBb8NnUg0leuQ7TsrZMl80ru-8Ta5K9dg9duRNTJdYb5OJhN8meQGmFkuNvUqvwPyRzj2
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quilt-1M%3A+One+Million+Image-Text+Pairs+for+Histopathology&rft.jtitle=Advances+in+neural+information+processing+systems&rft.au=Ikezogwo%2C+Wisdom+O&rft.au=Seyfioglu%2C+Mehmet+S&rft.au=Ghezloo%2C+Fatemeh&rft.au=Geva%2C+Dylan&rft.date=2023-12-01&rft.issn=1049-5258&rft.volume=36&rft.issue=DB1&rft.spage=37995&rft_id=info%3Apmid%2F38742142&rft_id=info%3Apmid%2F38742142&rft.externalDocID=38742142
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1049-5258&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1049-5258&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1049-5258&client=summon