Classification of multiwavelength transients with machine learning

ABSTRACT With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data c...

Full description

Saved in:
Bibliographic Details
Published inMonthly notices of the Royal Astronomical Society Vol. 502; no. 1; pp. 206 - 224
Main Authors Sooknunan, K, Lochner, M, Bassett, Bruce A, Peiris, H V, Fender, R, Stewart, A J, Pietka, M, Woudt, P A, McEwen, J D, Lahav, O
Format Journal Article
LanguageEnglish
Published Oxford University Press 01.03.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ABSTRACT With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; and (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the 11 classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78 per cent. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97 per cent, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19 per cent.
ISSN:0035-8711
1365-2966
1365-2966
DOI:10.1093/mnras/staa3873