Ting-Yun Cheng

Career Stage
Student (postgraduate)
Poster Abstract

Galaxy morphology are strongly connected with the stellar properties and the formation history of galaxies. For a century, galaxy morphologies are categorised based on their visual appearance. However, this kind of visual classification systems such as Hubble types [ellipticals (E), lenticulars (S0), spirals (S), and irregulars (Irr)] are intrinsically biased due to the subjective definition made by visual assessment. In this work, we apply unsupervised machine learning techniques (UML), which includes a vector-quantised variational autoencoder for feature learning and a hierarchical clustering algorithm, on the Sloan Digital Sky Survey (SDSS) imaging data to approach an objective morphological classification scheme suggested by machine learning without human involvement. 
With the strategies carried out in this work, our unsupervised machine successfully separates a variety of galaxies into 27 classes based on galaxy shape and structure. The 27 machine-defined morphological classes show a great transition on stellar properties such as colour, absolute magnitude, stellar mass, and physical size of galaxies, which strongly connected with galaxy evolution. Each class has distinctive features in these galaxy properties from each other. Moreover, to compare the machine classes with Hubble types, we realise that a mix of galaxy structure can exist in one visual morphology type. This reveals an intrinsic uncertainty existed in visual classification scheme such as Hubble sequence in precisely classifying galaxies. With a novel classification scheme proposed by machine learning, we can re-approach studies of galaxy evolution and formation in a different perspective.

Plain text summary
In this work, we make machine “sensibly see” images by applying an unsupervised machine learning technique, which includes:
(1) feature learning phase by Vector-Quantised Variational AutoEncoder (VQ-VAE);
(2) clustering phase by Hierarchical Clustering (HC).
We test this unsupervised machine on the Sloan Digital Sky Survey (SDSS) imaging data to approach an objective morphological classification scheme without human involvement.

An autoencoder learns the distribution (representative features) of the input images by the encoder, and based on the learnt distribution (extracted features) it then reproduces the input images by the decoder. Each image is then represented by a set of extracted features. Hierarchical clustering then gradually merges two nearest datapoints (with similar features) into groups in the feature space. The vector quantisation process used in this work accelerates the feature learning phase from 4-5 days to a few hours on 100k images.


To make machine sensible, in terms of close to human opinions, we propose three strategies in this work:
(1) to consider clustering performance simultaneously while learning representative features from images in the VQ-VAE;
(2) to use different distance thresholds in the HC depending on the complexity of galaxy images instead of single distance;
(3) to use the feature of galaxy orientation in the dataset into a distance cut to determine the optimal number of groups obtained from the HC.

The methodology applied in this work provides 27 classes for galaxy morphology. These machine-defined classes are separated based on galaxy shape and structure. Barred galaxies are distinctive and important visual classes in galaxy morphology as well as in galaxy evolution and formation. This structural feature can also be distinguished by our unsupervised machine. In the 27 machine-defined classes, some of them are significantly dominated by barred galaxies, or not.

To further analyse the visual features recognised by our unsupervised machine, we associate the 27 machine-defined classes with the visual morphology types such as Hubble types. No clean cluster is dominated by only ellipticals and early spirals due to a great similarity shared between ellipticals, lenticulars, and early spirals in structure. Additionally, most groups have a mixture of different Hubble types within them which indicates galaxies with similar features in appearance can be visually classifying into a variety of morphology types. This result reveals an intrinsic vagueness of the visual classification systems such that they are not always accurately defined.

We then examine the correlation of machine classification with the physical properties of galaxies. On the mass-size diagram, that the five clusters, with larger sizes, larger stellar masses, and are redder in colour, are shown to be dominated by barred galaxies. In particular, the clusters with the largest average size of galaxies has ~80% barred galaxies in the cluster. We notice that each galaxy cluster as defined by the machine has distinctive physical properties in galaxy colour, absolute magnitude, stellar mass, and physical size. This indicates that the machine-defined morphological classes show a strong connection with galaxy evolution. Additionally, our machine classes show a clear transition on the colour-magnitude diagram and mass-size relation between galaxy morphology and galaxy properties. They as well fill in the gap on the diagrams along with the Hubble types. This indicates that the machine classification scheme can complete the missing morphologies in the visual classification systems without involving human potential bias.

To summarise, with this machine-defined morphological classes, we reveal an intrinsic uncertainty existed in visual classification scheme such as Hubble sequence in precisely classifying galaxies. With a novel classification scheme proposed by machine learning, we can re-approach studies of galaxy evolution and formation in a different perspective.
Poster Title
What a Machine see? 
 — Exploring Galaxy Morphology with Unsupervised Machine Learning
Tags
Astronomy
Astrophysics
Data Science
Url
two2sunny@gmail.com/ Twitter: @AstroSunnyC