Mirko Curti

Gather.town id
MLA21
Poster Title
What drives the scatter in the BPT diagrams? A Machine Learning based analysis
Institution
University of Cambridge
Abstract (short summary)
I will present a data-based, Machine Learning analysis aimed at identifying which physical properties are mostly connected with the position of local star forming galaxies in the classical diagnostic 'BPT' diagrams.

Exploiting the huge statistics available from spectroscopic surveys in the local Universe like the SDSS and MaNGA, I have defined a framework in which the dispersion of galaxies in the BPT diagrams and, in particular, their deviation from the local sequence best-fit, can be described by means of the relative variation in different observational properties compared to the average value retained by the bulk of the galaxies along the sequence. Artificial Neural Networks and Random Forest Trees are implemented to both classify whether galaxies lie above or below the sequence and to predict the exact distance/offset from the sequence itself. We achieve a high accuracy on the test sample in both classification and regression tasks (AUC>95%, RMSE~0.025 ), with no clear overfitting. Moreover, different approaches are implemented to rank the parameters in terms of how much informative they are for the models. We show that the nitrogen-over-oxygen abundance ratio (N/O) and the ionisation parameter (U) are the most predictive parameters in the [N II]-BPT, whereas features related to the star-forming state of galaxies perform better in the [S II]-BPT. However, we also show that both the performances and relative importance of each feature change as we consider different regions within the diagrams.

These models represent also a valuable benchmark for high redshift galaxy samples, in order to assess to what extent the physics that shape the local BPT diagrams is the same causing the offset seen in high-z sources or, instead, whether a different framework or even different physical mechanisms need to be involved.
Plain text (extended) Summary
In this work, I implement machine learning techniques aimed at identifying which physical properties are mostly connected with the position of local star forming galaxies in the classical diagnostic ‘BPT' diagrams.

Exploiting the huge statistics available from local spectroscopic surveys like the SDSS, I have built a framework in which the dispersion of galaxies in the BPT diagrams and, in particular, their deviation from the local sequence best-fit, can be described by means of the relative variation in different observational properties compared to the average of galaxies along the sequence, once the gas-phase metallicity is fixed. Artificial Neural Networks and Random Forest Decision Trees are implemented to both classify whether galaxies lie above or below the sequence and to predict the exact distance from the sequence itself.

By feeding the network with a set of parameters related to different physical quantities (i.e, M*, SFR, sSFR, N/O, ionisation parameter, density, dust extinction), we achieve a high accuracy on the test sample in both classification and regression tasks (AUC>90%, RMSE~0.035), with no clear overfitting. Moreover, different approaches are implemented to rank the parameters in terms of how much informative they are for the network. We show that the nitrogen-over-oxygen abundance (N/O) is the most correlated parameter with the scatter in the [N II]-BPT, whereas features related to the star-formation state of galaxies (e.g., SFR, EW(Ha)) perform better in the
[S II]-BPT diagram.

However, we also show that both the performances and relative importances of the various parameters change as we separately consider different regions within the diagrams.

These models represent a valuable benchmark for high redshift galaxy samples, in order to assess to what extent the physics that shape the local BPT diagrams is the same causing the offset from the local sequence seen in high-z sources or, instead, whether a different framework or different physical mechanisms need to be involved.
URL
mc2041@cam.ac.uk