Logarithmic Lenses: Exploring Log RGB Data for Image Classification

The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 17470 - 17479
Main Authors Maxwell, Bruce A., Singhania, Sumegha, Patel, Avnish, Kumar, Rahul, Fryling, Heather, Li, Sihan, Sun, Haonan, He, Ping, Li, Zewen
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision.
AbstractList The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision.
Author Patel, Avnish
Kumar, Rahul
Li, Sihan
Singhania, Sumegha
Sun, Haonan
Li, Zewen
Maxwell, Bruce A.
He, Ping
Fryling, Heather
Author_xml – sequence: 1
  givenname: Bruce A.
  surname: Maxwell
  fullname: Maxwell, Bruce A.
  email: b.maxwell@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 2
  givenname: Sumegha
  surname: Singhania
  fullname: Singhania, Sumegha
  email: singhania.s@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 3
  givenname: Avnish
  surname: Patel
  fullname: Patel, Avnish
  email: patel.avni@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 4
  givenname: Rahul
  surname: Kumar
  fullname: Kumar, Rahul
  email: kumar.rahul4@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 5
  givenname: Heather
  surname: Fryling
  fullname: Fryling, Heather
  email: fryling.h@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 6
  givenname: Sihan
  surname: Li
  fullname: Li, Sihan
  email: li.siha@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 7
  givenname: Haonan
  surname: Sun
  fullname: Sun, Haonan
  email: sun.haon@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 8
  givenname: Ping
  surname: He
  fullname: He, Ping
  email: hi.pin@northeastern.edu
  organization: Northeastern University,Boston,USA
– sequence: 9
  givenname: Zewen
  surname: Li
  fullname: Li, Zewen
  email: li.zewen@northeastern.edu
  organization: Northeastern University,Boston,USA
BookMark eNqFybsKwjAUANCrKPjqHzjkB6w3iWlaR-sTOoiIqwRJa6RNJOmgf6-Du9MZzgh61lkNMKUYU4rZPL8cT4JJzmOGbBEjTcSiA1Ems5QL5IIjJl0YMiHFTKIUA4hCeCAiZ5QmWTqEvHCV8qa9N-ZGCm2DDkuyeT1r542tyHfJabcia9UqUjpPDo2qNMlrFYIpzU21xtkJ9EtVBx39HMN0uznn-5nRWl-f3jTKv68UE5Eip_xPfwCvvD4n
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52733.2024.01654
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350353006
EISSN 2575-7075
EndPage 17479
ExternalDocumentID 10658031
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-ieee_primary_106580313
IEDL.DBID RIE
IngestDate Wed Sep 25 09:21:55 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_106580313
ParticipantIDs ieee_primary_10658031
PublicationCentury 2000
PublicationDate 2024-June-16
PublicationDateYYYYMMDD 2024-06-16
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-16
  day: 16
PublicationDecade 2020
PublicationTitle 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 3.83865
Snippet The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used...
SourceID ieee
SourceType Publisher
StartPage 17470
SubjectTerms Computer vision
image classification
Image color analysis
physics-based vision
Pipelines
Sensor systems
Training
Transforms
Visual systems
Title Logarithmic Lenses: Exploring Log RGB Data for Image Classification
URI https://ieeexplore.ieee.org/document/10658031
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7oTp7mj4o_puTgtbVr0jbxaHVOGWMMld3GkqYqslZce_Gv9yXtKoqCt5BA8khI3vuS73sBOJMixECCa1fFvnJZkPoul5y5GVdhRpk0SeEM22IcDR_Y3SycNWJ1q4XRWlvymfZM0b7lp4WqzFUZ7nD0l75RTW9yP6jFWu2FCkUoEwneyOP6vjhPHidTk1-MIgwMmGeFO98-UbE-ZNCF8Xr0mjry6lWl9NTHj8SM_zZvG5wvuR6ZtI5oBzZ0vgvdJr4kze5d7UEyKp4QGpfPyxdFRohf9eqCtCQ8gq1kenNJrhblgmAsS26XeNgQ-22mIRTZNXSgN7i-T4auMWz-VueqmK9tovvQyYtcHwBRAkO6WHCDvpiQdBFkYahplNJYZSlLD8H5tYujP-qPYcvMr-FP9aMedMr3Sp-gpy7lqV2hT-OPlTc
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dT4MwFL0x80Gf5gfGj6l98BVEWr58FJ2b4rIs0-yNQCnTmIFx8OKv97YwjEYT3whN2hua9txTzrkFOEt8GxMJT-jcNbnOrNTUvcRjeuZxO6MskUXhpNpi5Awe2d3MnjVmdeWFEUIo8Zkw5KP6l58WvJJHZbjCES9N6ZpeR6A3_dqu1R6pUCQzju81BjlsPw-exhNZYYwiEbSYoaw7365RUSjS78JoNX4tHnk1qjIx-MeP0oz_DnALtC_DHhm3ULQNayLfgW6TYZJm_S53IQiLOZLj8nnxwkmIDFYsL0krwyPYSia3V-Q6LmOC2SwZLnC7IeriTCkpUrOoQa9_Mw0GugwsequrVUSrmOgedPIiF_tAuI9Jnet7kn8xP6Gxldm2oE5KXZ6lLD0A7dcuDv94fwobg-lDGIXD0f0RbMpvLdVUF04POuV7JY4Rt8vkRM3WJ_SOmI0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Logarithmic+Lenses%3A+Exploring+Log+RGB+Data+for+Image+Classification&rft.au=Maxwell%2C+Bruce+A.&rft.au=Singhania%2C+Sumegha&rft.au=Patel%2C+Avnish&rft.au=Kumar%2C+Rahul&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=2575-7075&rft.spage=17470&rft.epage=17479&rft_id=info:doi/10.1109%2FCVPR52733.2024.01654&rft.externalDocID=10658031