Peter Hatfield

Career Stage
Postdoctoral Researcher
Poster Abstract

Wide-area imaging surveys are one of the key ways of advancing our understanding of Cosmology, galaxy formation physics, and the large-scale structure of the Universe in the coming years. These surveys typically require calculating redshifts for huge numbers (hundreds of millions to billions) of galaxies - almost all of which must be derived from photometry rather than spectroscopy. In this poster (and the corresponding paper, Hatfield+2020, arXiv:2009.01952) we investigate how using statistical models to understand the populations that make up the colour-magnitude distribution of galaxies can be combined with machine learning photometric redshift codes to improve redshift estimates. In particular we combine the use of Gaussian Mixture Models with the high performing machine learning photo-z algorithm GPz and show that modelling and accounting for the different colour-magnitude distributions of training and test data separately can give improved redshift estimates, reduce the bias on estimates by up to a half, and speed up the run-time of the algorithm. These methods are illustrated using data from deep optical and near infrared data in two separate deep fields, where training and test data of different colour-magnitude distributions are constructed from the galaxies with known spectroscopic redshifts, derived from several heterogeneous surveys.

Plain text summary

This poster highlights some of the key results from our recent paper “Augmenting machine learning photometric redshifts with Gaussian mixture models”, recently accepted for publication in MNRAS.

Page 1

Wide-area imaging surveys are one of the key ways of advancing our understanding of Cosmology, galaxy formation physics, and the large-scale structure of the Universe in the coming years. These surveys typically require calculating redshifts for huge numbers (hundreds of millions to billions) of galaxies - almost all of which must be derived from photometry rather than spectroscopy. In this poster (and the corresponding paper) we investigate how using statistical models to understand the populations that make up the colour-magnitude distribution of galaxies can be combined with machine learning photometric redshift codes to improve redshift estimates. In particular we combine the use of Gaussian Mixture Models with the high performing machine learning photo-z algorithm GPz (a machine learning code for photo-z developed in Almosallam+2016, applied in Gomes+2018 , Duncan+2018) and show that modelling and accounting for the different colour-magnitude distributions of training and test data separately can give improved redshift estimates, reduce the bias on estimates by up to a half, and speed up the run-time of the algorithm. These methods are illustrated using data from deep optical and near infrared data in two separate deep fields (uGRIZYJHK data over COSMOS and XMM-LSS), where training and test data of different colour-magnitude distributions are constructed from the galaxies with known spectroscopic redshifts, derived from several heterogeneous surveys.

Page 2


The methods considered to improve the ML-based photo-z predictions were:
- Normal: base use of GPz
- GCSL: Upweighting parts of colour space common in the test data but rare in the training data
- GMM-Divide: Using a GMM to divide parameter space into smaller segments in an unsupervised way, and then training on them separately
- Weigh Validation: Making the validation data look more like the test data
- Resample: Retrain the algorithm multiple times, each time resampling new photometry values based on the photometry uncertainty
- Log: Modelling log(z) rather than z
- All: Using Weigh Validation, Resample and GMM-Divide simultaneously

I show a plot of bias as a function of photometric redshift – bias being the average value of photometric redshift minus true spectroscopic redshift. Zero would be unbiased (predictions might however still have scatter). I also show a plot of improvement relative to Normal i.e. which method gives the most improvement relative to the base application of GPz. In general we found `All’ gave the most improvements across all metrics (both bias, as well as other metrics not shown here but considered in the paper).


Page 3

I show the improved predictions that can be achieved by optimally combining ML-based photo-z and template-based photo-z. The hybrid prediction is calculated by using the ML result in the interpolative regime, and the template result in the extrapolative regime, for a set of predictions that outperforms each of ML and template-fitting individually.

I also highlight relevance to two scientific applications. Firstly to non-linear clustering to make inferences about the galaxy-halo relation. I show a plot from “Comparing Galaxy Clustering in Horizon-AGN Simulated Lightcone Mocks and VIDEO Observations” (Hatfield+2019). In that work we compared clustering measurements in the Horizon-AGN hydrodynamic cosmological simulation, where we had perfect redshift and stellar mass knowledge, to where we used photometric redshifts and stellar masses. I also show the application of GPz to the Rubin DESC Tomography Binning Challenge.

Poster Title
AUGMENTING MACHINE LEARNING PHOTOMETRIC REDSHIFTS WITH GAUSSIAN MIXTURE MODELS
Tags
Astronomy
Astrophysics
Cosmology
Data Science
Url
https://twitter.com/peterhatfield