Momentum Contrast for Unsupervised Visual Representation Learning

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that fac...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 9726 - 9735
Main Authors	He, Kaiming, Fan, Haoqi, Wu, Yuxin, Xie, Saining, Girshick, Ross
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2020
Subjects	Buildings Dictionaries Loss measurement Task analysis Training Unsupervised learning Visualization
Online Access	Get full text
ISSN	1063-6919
DOI	10.1109/CVPR42600.2020.00975

Cover

Abstract	We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
AbstractList	We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
Author	Xie, Saining Fan, Haoqi Wu, Yuxin Girshick, Ross He, Kaiming
Author_xml	– sequence: 1 givenname: Kaiming surname: He fullname: He, Kaiming organization: Facebook AI Research (FAIR) – sequence: 2 givenname: Haoqi surname: Fan fullname: Fan, Haoqi organization: Facebook AI Research (FAIR) – sequence: 3 givenname: Yuxin surname: Wu fullname: Wu, Yuxin organization: Facebook AI Research (FAIR) – sequence: 4 givenname: Saining surname: Xie fullname: Xie, Saining organization: Facebook AI Research (FAIR) – sequence: 5 givenname: Ross surname: Girshick fullname: Girshick, Ross organization: Facebook AI Research (FAIR)
BookMark	eNotjM1KxDAUhaMoOI59Al3kBTreJO1NsxyKf1BRBme2Q0pvJDJNS9IKvr0F5XD4Nuc71-wiDIEYuxOwEQLMfX143xUSATYSJGwAjC7PWGZ0JbRcKrAqz9lKAKocjTBXLEvpCwCUFAJNtWLb16GnMM09r4cwRZsm7obI9yHNI8Vvn6jjB59me-I7GiOlZWwnPwTekI3Bh88bdunsKVH2zzXbPz581M958_b0Um-b3EtQU647jbKVgEVBHVVWtE6hMVUhqdToULklBF3nlDCuJYltaVpQptWosXBqzW7_fj0RHcfoext_jkYstkL1C-8UTZY
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR42600.2020.00975
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEL IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781728171685 1728171687
EISSN	1063-6919
EndPage	9735
ExternalDocumentID	9157636
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i203t-7d762b20644ede8a1bf3699842e576f63f3f3e0ddf319fbe26b59b039b76764f3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:30:34 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-7d762b20644ede8a1bf3699842e576f63f3f3e0ddf319fbe26b59b039b76764f3
PageCount	10
ParticipantIDs	ieee_primary_9157636
PublicationCentury	2000
PublicationDate	2020-Jun
PublicationDateYYYYMMDD	2020-06-01
PublicationDate_xml	– month: 06 year: 2020 text: 2020-Jun
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.6669614
Snippet	We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build...
SourceID	ieee
SourceType	Publisher
StartPage	9726
SubjectTerms	Buildings Dictionaries Loss measurement Task analysis Training Unsupervised learning Visualization
Title	Momentum Contrast for Unsupervised Visual Representation Learning
URI	https://ieeexplore.ieee.org/document/9157636
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1BS8MwGA3bTp6mbqJOJQePdmuTNF2OMhxDmIzhxm6jab7IULuxthd_vV_aOkU8SC-hFBKSJt_7kvdeCLlNYsMTPwTPJiLyBAfjaRAI5LQSLI6tryqW75OcLMTjKlw1yN1BCwMAJfkM-q5YnuWbbVK4rbKBChAdc9kkTfzNKq3WYT-FYyYj1bBWxwW-GoyWs3npv45ZIHMELuXIhD_uUClDyLhNpl-VV8yR136R637y8cuX8b-tOybdb7EenR3C0AlpQHpK2jW6pPXczTrkfurMFvLinTpHqn2c5RQBK12kWbFzC0aGXy83WRG_0XlJj61VSSmtPVhfumQxfngeTbz6AgVvw3yee5HBpU4zRB0CDAzjQFsuMb8SDLCpVnKLD_jGWJyIVgOTOlTa50pHMpLC8jPSSrcpnBOaKBNwG3Cp40A4ixcBoTUIDllkA5OwC9JxPbLeVR4Z67ozLv9-3SNHbkwqytUVaeX7Aq4xuOf6phzVT81apTc
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH5BPOgJFYy_7cGjw23tOno0RIIKxBgg3Mi6vhqiDsK2i3-97TbRGA-ml6VZsqYv7fte931fAa7iSNHYDdDRMQsdRlE5EpkBclIwP4q0K0qW74j3J-xhFsxqcL3RwiBiQT7Dtn0s_uWrZZzbo7Ib4Rl0TPkWbJu8z4JSrbU5UaGmluGiU-njPFfcdKdPz4UDu6kDfUvhEpZO-OMWlSKJ9Bow_Pp8yR15beeZbMcfv5wZ_zu-PWh9y_XI0yYR7UMNkwNoVPiSVKs3bcLt0NotZPk7sZ5U6yjNiIGsZJKk-cpuGal5e7pI8-iNPBcE2UqXlJDKhfWlBZPe3bjbd6orFJyF79LMCZXZ7KRvcAdDhZ3Ik5pyU2ExH81QNafaNHSV0mYpaok-l4GQLhUy5CFnmh5CPVkmeAQkFsqj2qNcRh6zJi8MA60MPPRD7anYP4amnZH5qnTJmFeTcfJ39yXs9MfDwXxwP3o8hV0bn5KAdQb1bJ3juUn1mbwoIvwJpRaohA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Momentum+Contrast+for+Unsupervised+Visual+Representation+Learning&rft.au=He%2C+Kaiming&rft.au=Fan%2C+Haoqi&rft.au=Wu%2C+Yuxin&rft.au=Xie%2C+Saining&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=9726&rft.epage=9735&rft_id=info:doi/10.1109%2FCVPR42600.2020.00975&rft.externalDocID=9157636