Do End-to-end Stereo Algorithms Under-utilize Information?

Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation m...

Full description

Saved in:
Bibliographic Details
Main Authors Cai, Changjiang, Mordohai, Philippos
Format Journal Article
LanguageEnglish
Published 14.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets.
AbstractList Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets.
Author Cai, Changjiang
Mordohai, Philippos
Author_xml – sequence: 1
  givenname: Changjiang
  surname: Cai
  fullname: Cai, Changjiang
– sequence: 2
  givenname: Philippos
  surname: Mordohai
  fullname: Mordohai, Philippos
BackLink https://doi.org/10.48550/arXiv.2010.07350$$DView paper in arXiv
BookMark eNotj8FOAjEURbvQBaIf4Ir-QPHNvHZa3BiCiCQkLsT1pExftclMa0o1wter6Ookd3FyzwU7iykSY9cVTKVRCm5s_gqf0xp-BtCoYMRu7xNfRidKEhQdfy6UKfF5_5pyKG_Dnr9ER1l8lNCHI_F19CkPtoQU7y7Zubf9nq7-OWbbh-V28Sg2T6v1Yr4RttEgCCt0YHxlsQEz09BJ2aGizoP0O-dVZ7TRpGXtd1ApkM7X2npsEGckkXDMJn_a0_n2PYfB5kP7G9GeIvAbUT9ClQ
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2010.07350
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2010_07350
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a670-e313d08f1a3608970c44c35ecf04fbdf5c8787e742fb01504df27af36339e43e3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:47:56 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a670-e313d08f1a3608970c44c35ecf04fbdf5c8787e742fb01504df27af36339e43e3
OpenAccessLink https://arxiv.org/abs/2010.07350
ParticipantIDs arxiv_primary_2010_07350
PublicationCentury 2000
PublicationDate 2020-10-14
PublicationDateYYYYMMDD 2020-10-14
PublicationDate_xml – month: 10
  year: 2020
  text: 2020-10-14
  day: 14
PublicationDecade 2020
PublicationYear 2020
Score 1.788363
SecondaryResourceType preprint
Snippet Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computer Vision and Pattern Recognition
Title Do End-to-end Stereo Algorithms Under-utilize Information?
URI https://arxiv.org/abs/2010.07350
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwELZKJxYEAlSe8sBq4fgcJ2ZBFbRUSMBAkbJFjh9QqSSoDQjx67GdIFjYLPuWO9u6--z77hA60wkInaeOSK4d4cqPVA6CiCqRjikrRCSJ3d2L2RO_LdJigPAPF0atPhcfXX3gan3eZV5lEED5BmMhZevmoeg-J2Mprl7-V87HmHHqj5OYbqOtPrrD4247dtDA1rvo4rrBk9qQtiG2NvjRa2IbPF4-Nx6Yv7yucew9RPwJWC6-LO4JQsFgl3toPp3Mr2akb1lAlMgosZCAoblLFAiay4xqzjWkVjvKXWVcqnN_QayHo64KTw3cOJYpBwJAWg4W9tHQo347QlganjNngCmtfMhDpfaO3ykRGB0aND9Ao6ho-dZVpSiDDcpog8P_l47QJguAMaRk8GM0bFfv9sR71bY6jab9BoSzdi0
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Do+End-to-end+Stereo+Algorithms+Under-utilize+Information%3F&rft.au=Cai%2C+Changjiang&rft.au=Mordohai%2C+Philippos&rft.date=2020-10-14&rft_id=info:doi/10.48550%2Farxiv.2010.07350&rft.externalDocID=2010_07350