Logarithmic Lenses: Exploring Log RGB Data for Image Classification
The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the...
Saved in:
Published in | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 17470 - 17479 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision. |
---|---|
AbstractList | The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors; differences and sums correspond to ratios and products. Features in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision. |
Author | Patel, Avnish Kumar, Rahul Li, Sihan Singhania, Sumegha Sun, Haonan Li, Zewen Maxwell, Bruce A. He, Ping Fryling, Heather |
Author_xml | – sequence: 1 givenname: Bruce A. surname: Maxwell fullname: Maxwell, Bruce A. email: b.maxwell@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 2 givenname: Sumegha surname: Singhania fullname: Singhania, Sumegha email: singhania.s@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 3 givenname: Avnish surname: Patel fullname: Patel, Avnish email: patel.avni@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 4 givenname: Rahul surname: Kumar fullname: Kumar, Rahul email: kumar.rahul4@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 5 givenname: Heather surname: Fryling fullname: Fryling, Heather email: fryling.h@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 6 givenname: Sihan surname: Li fullname: Li, Sihan email: li.siha@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 7 givenname: Haonan surname: Sun fullname: Sun, Haonan email: sun.haon@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 8 givenname: Ping surname: He fullname: He, Ping email: hi.pin@northeastern.edu organization: Northeastern University,Boston,USA – sequence: 9 givenname: Zewen surname: Li fullname: Li, Zewen email: li.zewen@northeastern.edu organization: Northeastern University,Boston,USA |
BookMark | eNqFybsKwjAUANCrKPjqHzjkB6w3iWlaR-sTOoiIqwRJa6RNJOmgf6-Du9MZzgh61lkNMKUYU4rZPL8cT4JJzmOGbBEjTcSiA1Ems5QL5IIjJl0YMiHFTKIUA4hCeCAiZ5QmWTqEvHCV8qa9N-ZGCm2DDkuyeT1r542tyHfJabcia9UqUjpPDo2qNMlrFYIpzU21xtkJ9EtVBx39HMN0uznn-5nRWl-f3jTKv68UE5Eip_xPfwCvvD4n |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR52733.2024.01654 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9798350353006 |
EISSN | 2575-7075 |
EndPage | 17479 |
ExternalDocumentID | 10658031 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-ieee_primary_106580313 |
IEDL.DBID | RIE |
IngestDate | Wed Sep 25 09:21:55 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-ieee_primary_106580313 |
ParticipantIDs | ieee_primary_10658031 |
PublicationCentury | 2000 |
PublicationDate | 2024-June-16 |
PublicationDateYYYYMMDD | 2024-06-16 |
PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-16 day: 16 |
PublicationDecade | 2020 |
PublicationTitle | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003211698 |
Score | 3.83865 |
Snippet | The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 17470 |
SubjectTerms | Computer vision image classification Image color analysis physics-based vision Pipelines Sensor systems Training Transforms Visual systems |
Title | Logarithmic Lenses: Exploring Log RGB Data for Image Classification |
URI | https://ieeexplore.ieee.org/document/10658031 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7oTp7mj4o_puTgtbVr0jbxaHVOGWMMld3GkqYqslZce_Gv9yXtKoqCt5BA8khI3vuS73sBOJMixECCa1fFvnJZkPoul5y5GVdhRpk0SeEM22IcDR_Y3SycNWJ1q4XRWlvymfZM0b7lp4WqzFUZ7nD0l75RTW9yP6jFWu2FCkUoEwneyOP6vjhPHidTk1-MIgwMmGeFO98-UbE-ZNCF8Xr0mjry6lWl9NTHj8SM_zZvG5wvuR6ZtI5oBzZ0vgvdJr4kze5d7UEyKp4QGpfPyxdFRohf9eqCtCQ8gq1kenNJrhblgmAsS26XeNgQ-22mIRTZNXSgN7i-T4auMWz-VueqmK9tovvQyYtcHwBRAkO6WHCDvpiQdBFkYahplNJYZSlLD8H5tYujP-qPYcvMr-FP9aMedMr3Sp-gpy7lqV2hT-OPlTc |
link.rule.ids | 310,311,786,790,795,796,802,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dT4MwFL0x80Gf5gfGj6l98BVEWr58FJ2b4rIs0-yNQCnTmIFx8OKv97YwjEYT3whN2hua9txTzrkFOEt8GxMJT-jcNbnOrNTUvcRjeuZxO6MskUXhpNpi5Awe2d3MnjVmdeWFEUIo8Zkw5KP6l58WvJJHZbjCES9N6ZpeR6A3_dqu1R6pUCQzju81BjlsPw-exhNZYYwiEbSYoaw7365RUSjS78JoNX4tHnk1qjIx-MeP0oz_DnALtC_DHhm3ULQNayLfgW6TYZJm_S53IQiLOZLj8nnxwkmIDFYsL0krwyPYSia3V-Q6LmOC2SwZLnC7IeriTCkpUrOoQa9_Mw0GugwsequrVUSrmOgedPIiF_tAuI9Jnet7kn8xP6Gxldm2oE5KXZ6lLD0A7dcuDv94fwobg-lDGIXD0f0RbMpvLdVUF04POuV7JY4Rt8vkRM3WJ_SOmI0 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Logarithmic+Lenses%3A+Exploring+Log+RGB+Data+for+Image+Classification&rft.au=Maxwell%2C+Bruce+A.&rft.au=Singhania%2C+Sumegha&rft.au=Patel%2C+Avnish&rft.au=Kumar%2C+Rahul&rft.date=2024-06-16&rft.pub=IEEE&rft.eissn=2575-7075&rft.spage=17470&rft.epage=17479&rft_id=info:doi/10.1109%2FCVPR52733.2024.01654&rft.externalDocID=10658031 |