Geometry and uncertainty in deep learning for computer vision

Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. H...

Full description

Saved in:
Bibliographic Details
Main Author Kendall, Alex Guy
Format Dissertation
LanguageEnglish
Published University of Cambridge 2019
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. However, the science of computer vision aims to build machines which can see. This requires models which can extract richer information than recognition, from images and video. In general, applying these deep learning models from recognition to other problems in computer vision is significantly more challenging. This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation. Our models outperform traditional approaches and advance state-of-the-art on a number of challenging computer vision benchmarks. However, these end-to-end models are often not interpretable and require enormous quantities of training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the physical world, and (ii) we cannot know everything from data, our models should be aware of what they do not know. This thesis explores these ideas using concepts from geometry and uncertainty. Specifically, we show how to improve end-to-end deep learning models by leveraging the underlying geometry of the problem. We explicitly model concepts such as epipolar geometry to learn with unsupervised learning, which improves performance. Secondly, we introduce ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models. We show how to quantify different types of uncertainty, improving safety for real world applications.
AbstractList Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. However, the science of computer vision aims to build machines which can see. This requires models which can extract richer information than recognition, from images and video. In general, applying these deep learning models from recognition to other problems in computer vision is significantly more challenging. This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation. Our models outperform traditional approaches and advance state-of-the-art on a number of challenging computer vision benchmarks. However, these end-to-end models are often not interpretable and require enormous quantities of training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the physical world, and (ii) we cannot know everything from data, our models should be aware of what they do not know. This thesis explores these ideas using concepts from geometry and uncertainty. Specifically, we show how to improve end-to-end deep learning models by leveraging the underlying geometry of the problem. We explicitly model concepts such as epipolar geometry to learn with unsupervised learning, which improves performance. Secondly, we introduce ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models. We show how to quantify different types of uncertainty, improving safety for real world applications.
Author Kendall, Alex Guy
Author_xml – sequence: 1
  fullname: Kendall, Alex Guy
BookMark eNqdyzsSgjAQgOEUWvhqPMFeQEQyPBoLh_HR2NlnAiyyY9gwSXCG29twAqv_a_61WLBlFGJ_iqNTXmTyWF6ekUyTLF6J8x1tj8FNoLmBkWt0QROHCYihQRzAoHZM_IbWOqhtP4wBHXzJk-WtWLbaeNzN3Yjkdn2Vj0PlKJDvDFVOu0lh6KxXVtOsyqjxo_JMFmks_5p-COdGiA
ContentType Dissertation
DBID ABQQS
LLH
DEWEY 006.3
DOI 10.17863/CAM.35260
DatabaseName EThOS: Electronic Theses Online Service (Full Text)
EThOS: Electronic Theses Online Service
DatabaseTitleList
Database_xml – sequence: 1
  dbid: LLH
  name: EThOS: Electronic Theses Online Service
  url: http://ethos.bl.uk/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
DissertationAdvisor Cipolla, Roberto
DissertationAdvisor_xml – sequence: 1
  fullname: Cipolla, Roberto
DissertationDegree Thesis (Ph.D.)
DissertationSchool University of Cambridge
ExternalDocumentID oai_ethos_bl_uk_763850
GroupedDBID ABQQS
LLH
ID FETCH-britishlibrary_ethos_oai_ethos_bl_uk_7638503
IEDL.DBID LLH
IngestDate Tue Apr 04 21:56:14 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-britishlibrary_ethos_oai_ethos_bl_uk_7638503
Notes 0000000476534754
Woolf Fisher Trust
ORCID 0000000319045885
OpenAccessLink http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.763850
ParticipantIDs britishlibrary_ethos_oai_ethos_bl_uk_763850
PublicationCentury 2000
PublicationDate 2019
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – year: 2019
  text: 2019
PublicationDecade 2010
PublicationYear 2019
Publisher University of Cambridge
Publisher_xml – name: University of Cambridge
Score 3.7269306
Snippet Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated...
SourceID britishlibrary
SourceType Open Access Repository
SubjectTerms Computer Vision
Deep Learning
Machine Learning
Robotics
Title Geometry and uncertainty in deep learning for computer vision
URI http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.763850
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED9kvoiI3_hNHnyTVl2Spn0YIpuziNMXhb2VpLvq2Gxlax_233tpOpiPewskHLnA5Xd3yf0O4DpMkVDOoBeFqfREZring0B6d6iRy1Dp7N7WDg_egvhTvAzlsGn1ZXO65Xcx983UVlG9W-7JXv2Rcu6PiodqnHeqiZ10q8gyQhutb7YtJ57lkXyNGwJSFQb8tvs48C37O922O8YxBDWZkRUE6e_Bdm_l5XsfNjA_gN1lTwXWmNghdJ6x-MFytmAU4jMCHfdkXy7YOGcjxF_W9Hn4YuRusnQpwNWIH0G7__TRjb3_O0lqVRLL8exGZppUk8Spxo-hlRc5ngDLyKtvK0MnJ5QIONdCEbjrzGgpVGTkKdysIfhsrdXnsEVeQuTyDhfQKmcVXhISl-aqPvM_Co2Wig
link.rule.ids 230,312,786,891,4071,26595
linkProvider British Library Board
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Geometry+and+uncertainty+in+deep+learning+for+computer+vision&rft.DBID=ABQQS%3BLLH&rft.au=Kendall%2C+Alex+Guy&rft.date=2019&rft.pub=University+of+Cambridge&rft.advisor=Cipolla%2C+Roberto&rft.inst=University+of+Cambridge&rft_id=info:doi/10.17863%2FCAM.35260&rft.externalDBID=n%2Fa&rft.externalDocID=oai_ethos_bl_uk_763850