The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account fo...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 586 - 595
Main Authors Zhang, Richard, Isola, Phillip, Efros, Alexei A., Shechtman, Eli, Wang, Oliver
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR.2018.00068

Cover

Loading…
Abstract While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
AbstractList While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
Author Zhang, Richard
Shechtman, Eli
Wang, Oliver
Efros, Alexei A.
Isola, Phillip
Author_xml – sequence: 1
  givenname: Richard
  surname: Zhang
  fullname: Zhang, Richard
– sequence: 2
  givenname: Phillip
  surname: Isola
  fullname: Isola, Phillip
– sequence: 3
  givenname: Alexei A.
  surname: Efros
  fullname: Efros, Alexei A.
– sequence: 4
  givenname: Eli
  surname: Shechtman
  fullname: Shechtman, Eli
– sequence: 5
  givenname: Oliver
  surname: Wang
  fullname: Wang, Oliver
BookMark eNotj1FLwzAURqMoOGefffAlf6D1Jmluk0epqwoTh2y-jjS9xcpsS9IJ_vsVFD44L4cD3zW76IeeGLsVkAkB9r782LxnEoTJAADNGUtsYYRWBjGXYM_ZQgCqFK2wVyyJ8WvWJBplcr1g1faT-K4P5OLQu_pAfNW25Kfuh3qKkQ8tfyQaeUVuOgaK3M3jGwqexunoDvyVptD5G3bZukOk5J9LtqtW2_I5Xb89vZQP67STuZhSo4o6FwXoNhfolG49GdK-8dYahQWIQhFJtEobcBKImqYwNeYetJKolVqyu79uR0T7MXTfLvzujZ4PI6oTj1NMUQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00068
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 595
ExternalDocumentID 8578166
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i241t-837b41705f416a35fce8e5cdc9983670173ee2693580a20eedd78b64c05326533
IEDL.DBID RIE
IngestDate Wed Aug 27 02:52:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-837b41705f416a35fce8e5cdc9983670173ee2693580a20eedd78b64c05326533
PageCount 10
ParticipantIDs ieee_primary_8578166
PublicationCentury 2000
PublicationDate 2018-06
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-06
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.633729
Snippet While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite...
SourceID ieee
SourceType Publisher
StartPage 586
SubjectTerms Computer architecture
Distortion
Measurement
Network architecture
Task analysis
Training
Visualization
Title The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
URI https://ieeexplore.ieee.org/document/8578166
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED21nZgKtIhveWAkrZs4tjsXqgqpqEIUdati-ywhUFpBsvDr8SWhIMSAlMHJdLIdvzvfu3cAV0o75E6aKLVKRaRGGBmrReQ5KgwIhMpXap_3crYUd6t01YLrXS0MIlbkMxzQsMrlu40t6apsqMP2GknZhnYI3Opard19Six1opsMGb0nIbKRY92o-Yz4eDh5WjwQl4vIk5ykVX-0U6nQZNqF-ZcdNYnkZVAWZmA_fkk0_tfQfeh_1-2xxQ6RDqCF-SF0G0eTNb_xew-mYXOwZU589E1VO8VqEePm5GMbz24Qt4z8wzLE4ywLD1vUHJgye2Vz6sNl-7Cc3j5OZlHTUCF6DkBdRCEYNYL0c3xww7Ik9RY1ptbZEHORkNtIJYixrFKjWcyDsU5pI4Wl9hEyOIZH0Mk3OR4Di4XmzifcmXgsvDPBUU-5sCqx3Bjh-Qn0aFrW21ozY93MyOnfn89gjxampmCdQ6d4K_EigH1hLqtV_gSw9qck
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED2VMsBUoEV844GRtG7i2O5cqAq0VYVa1K2K7bOEQE0FycKvx05CQYgBKYOT6WQ7vjvfe-8AroQ0SA1XQayFCLwaYaC0ZIGlKNB5IBS2UPuc8OGc3S_iRQ2uN1wYRCzAZ9j2w6KWb1Kd-6uyjnTbq8v5FmzHnoxbsrU2Nyohl5GsamT-PXK5De_JSs-nS3ud_tP00aO5PHySenHVHw1VCn8yaMD4y5ISRvLSzjPV1h-_RBr_a-oetL6Ze2S68Un7UMPVATSqUJNUP_J7EwZue5D5yiPS04I9RUoZ4-rsI6klN4hr4iPE3GXkJHEPmZYomDx5JWPfiUu3YD64nfWHQdVSIXh2rjoLXDqqmFfQsS4QS6LYapQYa6Nd1uWl3LoiQgx5URxNQuqMNUIqzrRvIMFdaHgI9VW6wiMgIZPU2IgaFfaYNcqF6jFlWkSaKsUsPYamn5blulTNWFYzcvL350vYGc7Go-XobvJwCrt-kUpA1hnUs7ccz53rz9RFseKfmv6qbA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=The+Unreasonable+Effectiveness+of+Deep+Features+as+a+Perceptual+Metric&rft.au=Zhang%2C+Richard&rft.au=Isola%2C+Phillip&rft.au=Efros%2C+Alexei+A.&rft.au=Shechtman%2C+Eli&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=586&rft.epage=595&rft_id=info:doi/10.1109%2FCVPR.2018.00068&rft.externalDocID=8578166