Logarithmic Lenses: Exploring Log RGB Data for Image Classification

The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 17470 - 17479
Main Authors	Maxwell, Bruce A., Singhania, Sumegha, Patel, Avnish, Kumar, Rahul, Fryling, Heather, Li, Sihan, Sun, Haonan, He, Ping, Li, Zewen
Format	Conference Proceeding
Language	English
Published	IEEE 16.06.2024
Subjects	Computer vision image classification Image color analysis physics-based vision Pipelines Sensor systems Training Transforms Visual systems
Online Access	Get full text

Cover

Loading…

Abstract	The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision.
AbstractList	The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision.
Author	Patel, Avnish Kumar, Rahul Li, Sihan Singhania, Sumegha Sun, Haonan Li, Zewen Maxwell, Bruce A. He, Ping Fryling, Heather
Author_xml	– sequence: 1 givenname: Bruce A. surname: Maxwell fullname: Maxwell, Bruce A. email: b.maxwell@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 2 givenname: Sumegha surname: Singhania fullname: Singhania, Sumegha email: singhania.s@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 3 givenname: Avnish surname: Patel fullname: Patel, Avnish email: patel.avni@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 4 givenname: Rahul surname: Kumar fullname: Kumar, Rahul email: kumar.rahul4@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 5 givenname: Heather surname: Fryling fullname: Fryling, Heather email: fryling.h@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 6 givenname: Sihan surname: Li fullname: Li, Sihan email: li.siha@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 7 givenname: Haonan surname: Sun fullname: Sun, Haonan email: sun.haon@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 8 givenname: Ping surname: He fullname: He, Ping email: hi.pin@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 9 givenname: Zewen surname: Li fullname: Li, Zewen email: li.zewen@northeastern.edu organization: Northeastern University,Boston,USA
BookMark	eNqFybsKwjAUANCrKPjqHzjkB6w3iWlaR-sTOoiIqwRJa6RNJOmgf6-Du9MZzgh61lkNMKUYU4rZPL8cT4JJzmOGbBEjTcSiA1Ems5QL5IIjJl0YMiHFTKIUA4hCeCAiZ5QmWTqEvHCV8qa9N-ZGCm2DDkuyeT1r542tyHfJabcia9UqUjpPDo2qNMlrFYIpzU21xtkJ9EtVBx39HMN0uznn-5nRWl-f3jTKv68UE5Eip_xPfwCvvD4n
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR52733.2024.01654
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9798350353006
EISSN	2575-7075
EndPage	17479
ExternalDocumentID	10658031
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-ieee_primary_106580313
IEDL.DBID	RIE
IngestDate	Wed Sep 25 09:21:55 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-ieee_primary_106580313
ParticipantIDs	ieee_primary_10658031
PublicationCentury	2000
PublicationDate	2024-June-16
PublicationDateYYYYMMDD	2024-06-16
PublicationDate_xml	– month: 06 year: 2024 text: 2024-June-16 day: 16
PublicationDecade	2020
PublicationTitle	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev	CVPR
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	3.83865
Snippet	The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used...
SourceID	ieee
SourceType	Publisher
StartPage	17470
SubjectTerms	Computer vision image classification Image color analysis physics-based vision Pipelines Sensor systems Training Transforms Visual systems
Title	Logarithmic Lenses: Exploring Log RGB Data for Image Classification
URI	https://ieeexplore.ieee.org/document/10658031
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7oTp7mj4o_puTgtbVr0jbxaHVOGWMMld3GkqYqslZce_Gv9yXtKoqCt5BA8khI3vuS73sBOJMixECCa1fFvnJZkPoul5y5GVdhRpk0SeEM22IcDR_Y3SycNWJ1q4XRWlvymfZM0b7lp4WqzFUZ7nD0l75RTW9yP6jFWu2FCkUoEwneyOP6vjhPHidTk1-MIgwMmGeFO98-UbE-ZNCF8Xr0mjry6lWl9NTHj8SM_zZvG5wvuR6ZtI5oBzZ0vgvdJr4kze5d7UEyKp4QGpfPyxdFRohf9eqCtCQ8gq1kenNJrhblgmAsS26XeNgQ-22mIRTZNXSgN7i-T4auMWz-VueqmK9tovvQyYtcHwBRAkO6WHCDvpiQdBFkYahplNJYZSlLD8H5tYujP-qPYcvMr-FP9aMedMr3Sp-gpy7lqV2hT-OPlTc
link.rule.ids	310,311,786,790,795,796,802,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dT4MwFL0x80Gf5gfGj6l98BVEWr58FJ2b4rIs0-yNQCnTmIFx8OKv97YwjEYT3whN2hua9txTzrkFOEt8GxMJT-jcNbnOrNTUvcRjeuZxO6MskUXhpNpi5Awe2d3MnjVmdeWFEUIo8Zkw5KP6l58WvJJHZbjCES9N6ZpeR6A3_dqu1R6pUCQzju81BjlsPw-exhNZYYwiEbSYoaw7365RUSjS78JoNX4tHnk1qjIx-MeP0oz_DnALtC_DHhm3ULQNayLfgW6TYZJm_S53IQiLOZLj8nnxwkmIDFYsL0krwyPYSia3V-Q6LmOC2SwZLnC7IeriTCkpUrOoQa9_Mw0GugwsequrVUSrmOgedPIiF_tAuI9Jnet7kn8xP6Gxldm2oE5KXZ6lLD0A7dcuDv94fwobg-lDGIXD0f0RbMpvLdVUF04POuV7JY4Rt8vkRM3WJ_SOmI0
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Logarithmic+Lenses%3A+Exploring+Log+RGB+Data+for+Image+Classification&rft.au=Maxwell%2C+Bruce+A.&rft.au=Singhania%2C+Sumegha&rft.au=Patel%2C+Avnish&rft.au=Kumar%2C+Rahul&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=2575-7075&rft.spage=17470&rft.epage=17479&rft_id=info:doi/10.1109%2FCVPR52733.2024.01654&rft.externalDocID=10658031