FAIRification, Quality Assessment, and Missingness Pattern Discovery for Spatiotemporal Photovoltaic Data

Due to the fast growth of the photovoltaic (PV) market, more power plants have become available with data accessible for power forecasting and long-term reliability assess-ment. The accuracy of the modeling on this data is influenced heavily by the quality of the data and can be improved through dat...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE 49th Photovoltaics Specialists Conference (PVSC) pp. 0796 - 0801
Main Authors Oltjen, William C., Fan, Yangxin, Liu, Jiqi, Huang, Liangyi, Yu, Xuanji, Li, Mengjie, Seigneur, Hubert, Xiao, Xusheng, Davis, Kristopher O., Bruckman, Laura S., Wu, Yinghui, French, Roger H.
Format Conference Proceeding
LanguageEnglish
Published IEEE 05.06.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Due to the fast growth of the photovoltaic (PV) market, more power plants have become available with data accessible for power forecasting and long-term reliability assess-ment. The accuracy of the modeling on this data is influenced heavily by the quality of the data and can be improved through data imputation to fill missing gaps. In this study, we introduce a FAIRification framework for ingesting data from PV power plants. This process improves the efficiency of modeling on time series data provided by different labs and companies through an automated ingestion process. We take this analysis further by investigating the use of different imputation methods for filling in large chunks of missing data. Specifically, mean interpolation, linear interpolation, and k-nearest neighbors (KNN) were used in this report to fill in missing data for module temperature and power in a PV time series. It was found that the KNN algorithm outperforms the other methods due to its ability to leverage spatial coherence from nearby systems. These results point towards the potential use of a spatio-temporal graph neural network (st-GNN) in order to impute data using spatial coherence between systems in a large data set with time series data from many PV power plants.
DOI:10.1109/PVSC48317.2022.9938523