Do End-to-end Stereo Algorithms Under-utilize Information?

Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation m...

Full description

Saved in:

Bibliographic Details
Main Authors	Cai, Changjiang, Mordohai, Philippos
Format	Journal Article
Language	English
Published	14.10.2020
Subjects	Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

Abstract	Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets.
AbstractList	Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets.
Author	Cai, Changjiang Mordohai, Philippos
Author_xml	– sequence: 1 givenname: Changjiang surname: Cai fullname: Cai, Changjiang – sequence: 2 givenname: Philippos surname: Mordohai fullname: Mordohai, Philippos
BackLink	https://doi.org/10.48550/arXiv.2010.07350$$DView paper in arXiv
BookMark	eNotj8FOAjEURbvQBaIf4Ir-QPHNvHZa3BiCiCQkLsT1pExftclMa0o1wter6Ookd3FyzwU7iykSY9cVTKVRCm5s_gqf0xp-BtCoYMRu7xNfRidKEhQdfy6UKfF5_5pyKG_Dnr9ER1l8lNCHI_F19CkPtoQU7y7Zubf9nq7-OWbbh-V28Sg2T6v1Yr4RttEgCCt0YHxlsQEz09BJ2aGizoP0O-dVZ7TRpGXtd1ApkM7X2npsEGckkXDMJn_a0_n2PYfB5kP7G9GeIvAbUT9ClQ
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2010.07350
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2010_07350
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a670-e313d08f1a3608970c44c35ecf04fbdf5c8787e742fb01504df27af36339e43e3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:47:56 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a670-e313d08f1a3608970c44c35ecf04fbdf5c8787e742fb01504df27af36339e43e3
OpenAccessLink	https://arxiv.org/abs/2010.07350
ParticipantIDs	arxiv_primary_2010_07350
PublicationCentury	2000
PublicationDate	2020-10-14
PublicationDateYYYYMMDD	2020-10-14
PublicationDate_xml	– month: 10 year: 2020 text: 2020-10-14 day: 14
PublicationDecade	2020
PublicationYear	2020
Score	1.788363
SecondaryResourceType	preprint
Snippet	Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computer Vision and Pattern Recognition
Title	Do End-to-end Stereo Algorithms Under-utilize Information?
URI	https://arxiv.org/abs/2010.07350
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwELZKJxYEAlSe8sBq4fgcJ2ZBFbRUSMBAkbJFjh9QqSSoDQjx67GdIFjYLPuWO9u6--z77hA60wkInaeOSK4d4cqPVA6CiCqRjikrRCSJ3d2L2RO_LdJigPAPF0atPhcfXX3gan3eZV5lEED5BmMhZevmoeg-J2Mprl7-V87HmHHqj5OYbqOtPrrD4247dtDA1rvo4rrBk9qQtiG2NvjRa2IbPF4-Nx6Yv7yucew9RPwJWC6-LO4JQsFgl3toPp3Mr2akb1lAlMgosZCAoblLFAiay4xqzjWkVjvKXWVcqnN_QayHo64KTw3cOJYpBwJAWg4W9tHQo347QlganjNngCmtfMhDpfaO3ykRGB0aND9Ao6ho-dZVpSiDDcpog8P_l47QJguAMaRk8GM0bFfv9sR71bY6jab9BoSzdi0
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Do+End-to-end+Stereo+Algorithms+Under-utilize+Information%3F&rft.au=Cai%2C+Changjiang&rft.au=Mordohai%2C+Philippos&rft.date=2020-10-14&rft_id=info:doi/10.48550%2Farxiv.2010.07350&rft.externalDocID=2010_07350