We present the results from our applications of machine learning methods in asteroid taxonomy classifiation problems. The learning data are given in three dimensional colors (e.g., g-i, i-z, and griz) defined by us. Two different machine learning methods are tested and compared in our studies. Both methods model the distribution of asteroid colors as a mixture of Gaussian distributions (i.e., Gaussian mixtures). The model-based clustering method (i.e., unsupervised learning method) tries to identify dense stuructures as homogenous taxonomy groups without explicitly exploting the knowledge of the known taxnomy samples in the multidimensional color space. Therefore, clustering results require our interpretation of their correspondence to the known taxonomy groups. The second method is a semi-supervised learning algorithm which is also based on the Gaussian mixture model, and this method works with the colors of the known taxonomy samples in finding the best classification of target data. See Roh et al. (2020, accepted to A&A) for details in the SDSS color space. In this page, we provide the 3D plots of our analysis, downloadable analysis results, and simple Python scripts with relevant files to use the inferred taxonomy assignment models for newly measured asteroid colors.
The assignment of taxonomy can be done in two different ways: inference from combining MCMC samples with dissimilarity matrix (heareafter, raw) vs. maximal posterior inference (heareafter, MAP)
We identify seven taxonomy groups as recognizable outcomes in the semi-supervised learning results.
You can use
the Python script
provided by us with the above npy files (i.e., the mixture distribution
parameters) to infer asteroid taxonomy classes for given colors.
(usage) ./classify_object_with_npy_parameter_SDSS.py -un or -semi (color1: g-i) (color2: i-z) (color3: griz)
Examples are:
./classify_object_with_npy_parameter_SDSS.py -semi 0.28 -0.06 0.69
or
./classify_object_with_npy_parameter_SDSS.py -un 0.28 -0.06 0.69
where -un and -semi options mean using unsupervised and
semi-supervised learning results, respectively.