Geometry and uncertainty in deep learning for computer vision

Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. H...

Full description

Saved in:

Bibliographic Details
Main Author	Kendall, Alex Guy
Format	Dissertation
Language	English
Published	University of Cambridge 2019
Subjects	Computer Vision Deep Learning Machine Learning Robotics
Online Access	Get full text

Cover

Loading…

Abstract	Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. However, the science of computer vision aims to build machines which can see. This requires models which can extract richer information than recognition, from images and video. In general, applying these deep learning models from recognition to other problems in computer vision is significantly more challenging. This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation. Our models outperform traditional approaches and advance state-of-the-art on a number of challenging computer vision benchmarks. However, these end-to-end models are often not interpretable and require enormous quantities of training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the physical world, and (ii) we cannot know everything from data, our models should be aware of what they do not know. This thesis explores these ideas using concepts from geometry and uncertainty. Specifically, we show how to improve end-to-end deep learning models by leveraging the underlying geometry of the problem. We explicitly model concepts such as epipolar geometry to learn with unsupervised learning, which improves performance. Secondly, we introduce ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models. We show how to quantify different types of uncertainty, improving safety for real world applications.
AbstractList	Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. However, the science of computer vision aims to build machines which can see. This requires models which can extract richer information than recognition, from images and video. In general, applying these deep learning models from recognition to other problems in computer vision is significantly more challenging. This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation. Our models outperform traditional approaches and advance state-of-the-art on a number of challenging computer vision benchmarks. However, these end-to-end models are often not interpretable and require enormous quantities of training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the physical world, and (ii) we cannot know everything from data, our models should be aware of what they do not know. This thesis explores these ideas using concepts from geometry and uncertainty. Specifically, we show how to improve end-to-end deep learning models by leveraging the underlying geometry of the problem. We explicitly model concepts such as epipolar geometry to learn with unsupervised learning, which improves performance. Secondly, we introduce ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models. We show how to quantify different types of uncertainty, improving safety for real world applications.
Author	Kendall, Alex Guy
Author_xml	– sequence: 1 fullname: Kendall, Alex Guy
BookMark	eNqdyzsSgjAQgOEUWvhqPMFeQEQyPBoLh_HR2NlnAiyyY9gwSXCG29twAqv_a_61WLBlFGJ_iqNTXmTyWF6ekUyTLF6J8x1tj8FNoLmBkWt0QROHCYihQRzAoHZM_IbWOqhtP4wBHXzJk-WtWLbaeNzN3Yjkdn2Vj0PlKJDvDFVOu0lh6KxXVtOsyqjxo_JMFmks_5p-COdGiA
ContentType	Dissertation
DBID	ABQQS LLH
DEWEY	006.3
DOI	10.17863/CAM.35260
DatabaseName	EThOS: Electronic Theses Online Service (Full Text) EThOS: Electronic Theses Online Service
DatabaseTitleList
Database_xml	– sequence: 1 dbid: LLH name: EThOS: Electronic Theses Online Service url: http://ethos.bl.uk/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
DissertationAdvisor	Cipolla, Roberto
DissertationAdvisor_xml	– sequence: 1 fullname: Cipolla, Roberto
DissertationDegree	Thesis (Ph.D.)
DissertationSchool	University of Cambridge
ExternalDocumentID	oai_ethos_bl_uk_763850
GroupedDBID	ABQQS LLH
ID	FETCH-britishlibrary_ethos_oai_ethos_bl_uk_7638503
IEDL.DBID	LLH
IngestDate	Tue Apr 04 21:56:14 EDT 2023
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-britishlibrary_ethos_oai_ethos_bl_uk_7638503
Notes	0000000476534754 Woolf Fisher Trust
ORCID	0000000319045885
OpenAccessLink	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.763850
ParticipantIDs	britishlibrary_ethos_oai_ethos_bl_uk_763850
PublicationCentury	2000
PublicationDate	2019
PublicationDateYYYYMMDD	2019-01-01
PublicationDate_xml	– year: 2019 text: 2019
PublicationDecade	2010
PublicationYear	2019
Publisher	University of Cambridge
Publisher_xml	– name: University of Cambridge
Score	3.7269306
Snippet	Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated...
SourceID	britishlibrary
SourceType	Open Access Repository
SubjectTerms	Computer Vision Deep Learning Machine Learning Robotics
Title	Geometry and uncertainty in deep learning for computer vision
URI	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.763850
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED9kvoiI3_hNHnyTVl2Spn0YIpuziNMXhb2VpLvq2Gxlax_233tpOpiPewskHLnA5Xd3yf0O4DpMkVDOoBeFqfREZring0B6d6iRy1Dp7N7WDg_egvhTvAzlsGn1ZXO65Xcx983UVlG9W-7JXv2Rcu6PiodqnHeqiZ10q8gyQhutb7YtJ57lkXyNGwJSFQb8tvs48C37O922O8YxBDWZkRUE6e_Bdm_l5XsfNjA_gN1lTwXWmNghdJ6x-MFytmAU4jMCHfdkXy7YOGcjxF_W9Hn4YuRusnQpwNWIH0G7__TRjb3_O0lqVRLL8exGZppUk8Spxo-hlRc5ngDLyKtvK0MnJ5QIONdCEbjrzGgpVGTkKdysIfhsrdXnsEVeQuTyDhfQKmcVXhISl-aqPvM_Co2Wig
link.rule.ids	230,312,786,891,4071,26595
linkProvider	British Library Board
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Geometry+and+uncertainty+in+deep+learning+for+computer+vision&rft.DBID=ABQQS%3BLLH&rft.au=Kendall%2C+Alex+Guy&rft.date=2019&rft.pub=University+of+Cambridge&rft.advisor=Cipolla%2C+Roberto&rft.inst=University+of+Cambridge&rft_id=info:doi/10.17863%2FCAM.35260&rft.externalDBID=n%2Fa&rft.externalDocID=oai_ethos_bl_uk_763850