Momentum Contrast for Unsupervised Visual Representation Learning

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that fac...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 9726 - 9735
Main Authors He, Kaiming, Fan, Haoqi, Wu, Yuxin, Xie, Saining, Girshick, Ross
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2020
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR42600.2020.00975

Cover

Abstract We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
AbstractList We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
Author Xie, Saining
Fan, Haoqi
Wu, Yuxin
Girshick, Ross
He, Kaiming
Author_xml – sequence: 1
  givenname: Kaiming
  surname: He
  fullname: He, Kaiming
  organization: Facebook AI Research (FAIR)
– sequence: 2
  givenname: Haoqi
  surname: Fan
  fullname: Fan, Haoqi
  organization: Facebook AI Research (FAIR)
– sequence: 3
  givenname: Yuxin
  surname: Wu
  fullname: Wu, Yuxin
  organization: Facebook AI Research (FAIR)
– sequence: 4
  givenname: Saining
  surname: Xie
  fullname: Xie, Saining
  organization: Facebook AI Research (FAIR)
– sequence: 5
  givenname: Ross
  surname: Girshick
  fullname: Girshick, Ross
  organization: Facebook AI Research (FAIR)
BookMark eNotjM1KxDAUhaMoOI59Al3kBTreJO1NsxyKf1BRBme2Q0pvJDJNS9IKvr0F5XD4Nuc71-wiDIEYuxOwEQLMfX143xUSATYSJGwAjC7PWGZ0JbRcKrAqz9lKAKocjTBXLEvpCwCUFAJNtWLb16GnMM09r4cwRZsm7obI9yHNI8Vvn6jjB59me-I7GiOlZWwnPwTekI3Bh88bdunsKVH2zzXbPz581M958_b0Um-b3EtQU647jbKVgEVBHVVWtE6hMVUhqdToULklBF3nlDCuJYltaVpQptWosXBqzW7_fj0RHcfoext_jkYstkL1C-8UTZY
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR42600.2020.00975
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEL
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781728171685
1728171687
EISSN 1063-6919
EndPage 9735
ExternalDocumentID 9157636
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-7d762b20644ede8a1bf3699842e576f63f3f3e0ddf319fbe26b59b039b76764f3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:30:34 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-7d762b20644ede8a1bf3699842e576f63f3f3e0ddf319fbe26b59b039b76764f3
PageCount 10
ParticipantIDs ieee_primary_9157636
PublicationCentury 2000
PublicationDate 2020-Jun
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-Jun
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.6669614
Snippet We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build...
SourceID ieee
SourceType Publisher
StartPage 9726
SubjectTerms Buildings
Dictionaries
Loss measurement
Task analysis
Training
Unsupervised learning
Visualization
Title Momentum Contrast for Unsupervised Visual Representation Learning
URI https://ieeexplore.ieee.org/document/9157636
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1BS8MwGA3bTp6mbqJOJQePdmuTNF2OMhxDmIzhxm6jab7IULuxthd_vV_aOkU8SC-hFBKSJt_7kvdeCLlNYsMTPwTPJiLyBAfjaRAI5LQSLI6tryqW75OcLMTjKlw1yN1BCwMAJfkM-q5YnuWbbVK4rbKBChAdc9kkTfzNKq3WYT-FYyYj1bBWxwW-GoyWs3npv45ZIHMELuXIhD_uUClDyLhNpl-VV8yR136R637y8cuX8b-tOybdb7EenR3C0AlpQHpK2jW6pPXczTrkfurMFvLinTpHqn2c5RQBK12kWbFzC0aGXy83WRG_0XlJj61VSSmtPVhfumQxfngeTbz6AgVvw3yee5HBpU4zRB0CDAzjQFsuMb8SDLCpVnKLD_jGWJyIVgOTOlTa50pHMpLC8jPSSrcpnBOaKBNwG3Cp40A4ixcBoTUIDllkA5OwC9JxPbLeVR4Z67ozLv9-3SNHbkwqytUVaeX7Aq4xuOf6phzVT81apTc
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH5BPOgJFYy_7cGjw23tOno0RIIKxBgg3Mi6vhqiDsK2i3-97TbRGA-ml6VZsqYv7fte931fAa7iSNHYDdDRMQsdRlE5EpkBclIwP4q0K0qW74j3J-xhFsxqcL3RwiBiQT7Dtn0s_uWrZZzbo7Ib4Rl0TPkWbJu8z4JSrbU5UaGmluGiU-njPFfcdKdPz4UDu6kDfUvhEpZO-OMWlSKJ9Bow_Pp8yR15beeZbMcfv5wZ_zu-PWh9y_XI0yYR7UMNkwNoVPiSVKs3bcLt0NotZPk7sZ5U6yjNiIGsZJKk-cpuGal5e7pI8-iNPBcE2UqXlJDKhfWlBZPe3bjbd6orFJyF79LMCZXZ7KRvcAdDhZ3Ik5pyU2ExH81QNafaNHSV0mYpaok-l4GQLhUy5CFnmh5CPVkmeAQkFsqj2qNcRh6zJi8MA60MPPRD7anYP4amnZH5qnTJmFeTcfJ39yXs9MfDwXxwP3o8hV0bn5KAdQb1bJ3juUn1mbwoIvwJpRaohA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Momentum+Contrast+for+Unsupervised+Visual+Representation+Learning&rft.au=He%2C+Kaiming&rft.au=Fan%2C+Haoqi&rft.au=Wu%2C+Yuxin&rft.au=Xie%2C+Saining&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=9726&rft.epage=9735&rft_id=info:doi/10.1109%2FCVPR42600.2020.00975&rft.externalDocID=9157636