StratLearn-z: Improved photo- z estimation from spectroscopic data subject to selection effects

A precise measurement of photometric redshifts (photo-z) is crucial for the success of modern photometric galaxy surveys. Machine learning (ML) methods show great promise in this context, but suffer from covariate shift in training sets due to selection bias where interesting sources, e.g., high red...

Full description

Saved in:

Bibliographic Details
Published in	The open journal of astrophysics Vol. 8
Main Authors	Moretti, Chiara, Autenrieth, Maximilian, Serra, Riccardo, Trotta, Roberto, van Dyk, David A., Mesinger, Andrei
Format	Journal Article
Language	English
Published	Maynooth Academic Publishing 01.05.2025
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A precise measurement of photometric redshifts (photo-z) is crucial for the success of modern photometric galaxy surveys. Machine learning (ML) methods show great promise in this context, but suffer from covariate shift in training sets due to selection bias where interesting sources, e.g., high redshift objects, are underrepresented, and the corresponding ML models exhibit poor generalisation properties. We present an application of the StratLearn method to the estimation of photo-z (StratLearn-z), validating against simulations where we enforce the presence of covariate shift to different degrees. StratLearn is a statistically principled approach which relies on splitting the combined source and target datasets into strata, based on estimated propensity scores. The latter is the probability for an object in the dataset to be in the source set, given its observed covariates. After stratification, two conditional density estimators are fit separately within each stratum, and then combined via a weighted average. We benchmark our results against the GPz algorithm, quantifying the performance of the two algorithms with a set of metrics. Our results show that the StratLearn-z metrics are only marginally affected by the presence of covariate shift, while GPz shows a significant degradation of performance, specifically concerning the photo-z prediction for fainter objects for which there is little training data. In particular, for the strongest covariate shift scenario considered, StratLearn-z yields a reduced fraction of catastrophic errors, a factor of 2 improvement for the RMSE as well as one order of magnitude improvement on the bias. We also assess the quality of the predicted conditional redshift estimates using the probability integral transform (PIT) and the continuous rank probability score (CRPS). The PIT for StratLearn-z indicates that predictions are well-centered around the true redshift value, if conservative in their variance; the CRPS shows marked improvement at high redshifts when compared with GPz. Our julia implementation of the method, StratLearn-z, is publicly available at .
ISSN:	2565-6120 2565-6120
DOI:	10.33232/001c.137525