Deep multimodal learning for residential building energy prediction

Abstract The residential sector has become the second-largest energy consumer since 1987 in the UK. Approximately 24 million existing dwellings in England made up over 32% of the overall energy consumption in 2020. A robust understanding of existing buildings’ energy performance is therefore critica...

Full description

Saved in:
Bibliographic Details
Published inIOP conference series. Earth and environmental science Vol. 1078; no. 1; pp. 12038 - 12048
Main Authors Sheng, Y, Ward, W OC, Arbabi, H, Álvarez, M, Mayfield, M
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.09.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract The residential sector has become the second-largest energy consumer since 1987 in the UK. Approximately 24 million existing dwellings in England made up over 32% of the overall energy consumption in 2020. A robust understanding of existing buildings’ energy performance is therefore critical in guiding proper home retrofit measures to accelerate towards meeting the UK’s climate targets. A substantial number of predictions at a city scale rely on available data, e.g., Energy Performance Certificates (EPCs) and GIS products, to develop statistical and machine learning models to estimate energy consumption. However, issues with existing data are not negligible. This work adopted the idea of deep multimodal learning to study the potential for using Google Street View (GSV) images as an additional input for residential building energy prediction. 20,031 GSV images of 5,933 residential buildings in central Barnsley, UK, have been selected for a case study. All images were pre-processed using a state-of-the-art object detection algorithm to minimise the noise caused by other elements that may appear nearby. Building specifications that cannot be easily determined by the appearance are extracted from existing EPC information as text-based inputs for prediction. A multimodal model was designed to jointly take images and texts as inputs. These inputs are first propagated through a convolutional neural network and multi-layer perceptron, respectively, before being combined into a connected network for final energy prediction. The multi-input model was trained and tested on the case study area and predicted an annual energy consumption with a mean absolute difference of 0.01kWh/m 2 per annum on average compared with what is recorded in the EPC. The difference between the predicted results and the EPC may also provide some hints on the bias the certificates potentially contain.
ISSN:1755-1307
1755-1315
DOI:10.1088/1755-1315/1078/1/012038