Mayur R. Bakrania
Collisionless space plasma environments are characterised by distinct particle populations that typically do not mix. Although moments of their velocity distribution functions help in distinguishing different plasma regimes, the distribution functions themselves provide more comprehensive information about the plasma state. Unlike moments, however, distribution functions are not easily classified by a small number of parameters, making their classification more difficult to achieve. To perform this classification, we distinguish between the different plasma regions by applying dimensionality reduction and clustering methods to electron distributions in pitch angle and energy space. We utilise four algorithms to achieve our classifications: autoencoders, principal component analysis, mean shift, and agglomerative clustering.
We test our classification algorithms by applying our scheme to data from the Cluster-PEACE instrument measured in the Earth’s magnetotail. Traditionally, it is thought that the Earth’s magnetotail is split into three regions that are primarily defined by their plasma characteristics. Starting with the ECLAT database with associated classifications based on the plasma parameters, we identify 8 distinct groups of distributions, that are dependent upon significantly more complex plasma and field dynamics. By comparing the average distributions and the plasma and magnetic field parameters for each region, we relate several of the groups to different plasma sheet populations, and the rest to the plasma sheet boundary layer and the lobes. We find clear distinctions between each of our classified regions and the ECLAT results.
The automated classification of different regions in space plasma environments provides a useful tool to identify the physical processes governing particle populations in near-Earth space. These tools are model independent, providing reproducible results without requiring expert judgement or the placement of arbitrary thresholds. Similar methods could be used onboard spacecraft to reduce the dimensionality of distributions in order to optimise data collection and downlink resources in future missions.
Distribution functions are not easily classified by a small number of parameters. We therefore propose to apply dimensionality reduction and clustering methods to particle distributions in pitch angle and energy space as a new method to distinguish between the different plasma regions.
Dimensionality reduction is a specific type of unsupervised learning in which data in high-dimensional space is transformed to a meaningful representation in lower dimensional space. This transformation allows complex datasets to be characterised by analysis techniques with much more computational efficiency.
We use the autoencoder to compress the data by a factor of 10 from a high-dimensional representation. We subsequently apply the PCA algorithm to further compress the data to a three-dimensional representation. After compressing the data, we use the mean shift algorithm to inform us of how many populations are present in the data using this three-dimensional representation. And finally, we use an agglomerative clustering algorithm to assign each data-point to one of the populations.
We use electron data from the magnetotail to test the effectiveness of our method. The magnetotail is traditionally divided into three different regions: the plasma sheet (PS), the plasma sheet boundary layer (PSBL), and the lobes. We obtain Cluster-PEACE data from times when the C4 spacecraft has spent at least 1 hour in each region, according to Cluster-ECLAT dataset. The dimensionality of each of our distribution samples is 312 (12 pitch angle bins times 26 energy bins).
Our method is as follows:
1. Download 2D flux data as a function of pitch angle and energy. Normalise the flux linearly between 0 and 1.
2. Build an autoencoder with one input layer, one encoded layer, and one output layer. The number of neurons in the input and output layers is equal to the dimensionality of the data and the number of neurons in the encoded layer is approximately a factor of 10 smaller.
3. Extract the compressed data from the encoded layer and apply a PCA algorithm to reduce its dimensionality to 3.
4. Apply the mean shift algorithm to the compressed data.
5. Apply an agglomerative clustering algorithm to the compressed data.
Fig. 3 shows the result of applying the agglomerative clustering algorithm to the compressed magnetotail electron. The plot shows that the clustering algorithm is able to assign data-points of varying PCA values to the same cluster if they belong to the same complex non-spherical structure. The clustering algorithm is able to form clear boundaries between clusters with adjacent PCA values, with no mixing of cluster labels on either side of the boundaries.
Fig. 5 shows the average electron differential energy flux distributions for each cluster. We see large differences in the average pitch angle/energy distributions. Each distribution differs by the: peak flux energy, peak flux value, or the pitch angle anisotropy. The lack of identical distributions shows mean shift has not overestimated the number of clusters.
Table 1 shows a contingency table comparing our classifications to the original ECLAT labels. The majority of clustering labels are in agreement with the ECLAT regions. For AC labels 0, 1, 2, 4, and 6, which represent various populations within the plasma sheet, there is 100% agreement with the ECLAT label 0. By using this method to characterise pitch angle and energy distributions, instead of using the derived moments, we successfully distinguish between multiple populations within what has historically been considered as one region, due to the lack of variation in the plasma.