Spatiotemporal Fusion Network for Land Surface Temperature Based on a Conditional Variational Autoencoder
High spatiotemporal resolution land surface temperature (LST) data are essential for dynamic monitoring and prediction in climate change research. Due to the limitations of remote sensing instruments, the current platforms have difficulty in achieving a compromise between high spatial and temporal r...
Saved in:
Published in | IEEE transactions on geoscience and remote sensing Vol. 60; pp. 1 - 13 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | High spatiotemporal resolution land surface temperature (LST) data are essential for dynamic monitoring and prediction in climate change research. Due to the limitations of remote sensing instruments, the current platforms have difficulty in achieving a compromise between high spatial and temporal resolutions for LST products. In this study, we propose a spatiotemporal fusion network for LST based on a conditional variational autoencoder (CVAE-LSTFM). First, an improved network is designed based on the CVAE by reconstructing an encoder and a decoder. To generate fine LST images based on dense time series, a variational inference model is formulated to integrate coarse and fine LST image pairs in variational autoencoded latent space. In addition, a new compound loss function for the proposed training method is designed to reduce the effects of noise and outliers. Then, a pretraining mechanism is adopted to optimize the network training process, and the parameters can be transferred to the training network of the CVAE to accelerate network convergence. Finally, a novel weighting strategy that considers spatiotemporal variations in LST (LST consistency weighting) is employed to solve the spatiotemporal heterogeneity problem caused by the rapid changes in LST. The method is quantitatively tested and evaluated in the Heihe River Basin using FY-4A LST and MODIS LST from September 2019. Compared with two traditional models and two deep learning-based models, CVAE-LSTFM yields lower root-mean-square error (RMSE) (average < 1.26 K) and learning perceptual image patch similarity (LPIPS, average < 0.13) values and higher structural similarity (SSIM, average <inline-formula> <tex-math notation="LaTeX">>0.96 </tex-math></inline-formula>). In practice, CVAE-LSTFM can generate high spatiotemporal resolution LST values (hourly LST with a 1-km spatial resolution) with high accuracy, quality, and robustness. |
---|---|
ISSN: | 0196-2892 1558-0644 |
DOI: | 10.1109/TGRS.2022.3183114 |