Shanghai Astronomical Observatory Team Proposes New Method for Validating Open Cluster Candidates Based on Photometric Bayesian Evidence

Fig 1. Open Cluster NGC2632(Copyright: Stuart Heggie)
Open clusters are not only tracers of the structure and evolution of the Milky Way but also ideal laboratories for studying star formation and stellar evolution. The massive data release from the Gaia mission has dramatically expanded the number of known open clusters; however, this has also introduced a tricky challenge: the thousands of cluster candidates identified by clustering algorithms are significantly contaminated by "false positive" signals caused by random fluctuations of field stars. Recently, a research team led by Dr. Lu Li from the Shanghai Astronomical Observatory (SHAO), Chinese Academy of Sciences, proposed a new framework for the physical validation of clusters based on "Photometric Bayesian Evidence." Utilizing their self-developed "Mixture Model for Open Clusters (MiMO)," this method provides a powerful quantitative tool for "separating the wheat from the chaff" among the massive number of candidates in the big data era. This research result was officially published in the international astronomical journal The Astrophysical Journal.
In traditional research, validating the existence of a cluster candidate usually relies on astronomers visually inspecting its color-magnitude diagram (CMD) to see if it presents a clear "isochrone" feature. However, this experience-based judgment is not only prone to subjective human bias but also lacks quantitative evaluation standards. Especially for candidates with ambiguous features, it is difficult to determine with the naked eye whether a dispersed distribution on the CMD originates from a real cluster broadened by observational errors and binaries, or simply from a random combination of field stars along the line of sight. Therefore, to ensure sample purity and the reliability of scientific conclusions, there is an urgent need for a statistically rigorous quantitative validation method that replaces subjective intuition with objective calculations, mathematically distinguishing whether a candidate is a real physical system or a random statistical fluctuation.

Fig 2 Comparison of Color-Magnitude Diagrams (CMDs) for different targets. Left: Random field stars. Middle: Ambiguous candidates. Right: Confirmed open clusters (Orange points represent sample stars; grey background shows the field model).
To address this problem, the research team utilized the Bayesian framework of the MiMO model to transform "cluster candidate validation" into a rigorous statistical model comparison problem: comparing whether the observational data better supports a "Single Stellar Population (SSP, i.e., cluster) + Field Star" mixture model or a "Pure Field Star" model. The ratio of the evidence for these two models, known as the Bayes Factor (BF), directly quantifies the strength of statistical support for the cluster's existence.
The team conducted extensive tests using 600 random field star samples and 1,232 confirmed open clusters. The results show that the Bayes Factor separates real clusters from false signals extremely well. The study found that log10(BF) > 2 (i.e., a Bayes Factor greater than 100) serves as a robust physical criterion, implying that the "Cluster + Field" model is at least 100 times more probable than the "Pure Field" model. This threshold effectively eliminates the vast majority of statistical fluctuations from random field stars while preserving real cluster signals.

Fig 3. The distribution of Bayes Factors effectively separates real signals from noise. The blue histogram represents confirmed Open Clusters, while the orange represents mock random field samples. The vertical dashed line marks the threshold at log_{10}(BF) = 2. This clear separation demonstrates that the metric robustly distinguishes genuine physical systems from random background fluctuation.
Unlike traditional signal-to-noise ratios or goodness-of-fit metrics, the Bayes Factor remains sensitive enough to capture hidden cluster signals even under extremely high field contamination (contamination rates > 70%), demonstrating strong robustness. This new method is not only applicable to the cleaning and purification of open clusters but its general framework based on mixture model comparison can also be broadly applied to the validation of other resolved stellar systems, such as stellar streams, moving groups, and satellite galaxies of the Milky Way.
The first author and corresponding author of the paper is Dr. Lu Li from the Shanghai Astronomical Observatory, and co-authors include Professor Zhaozhou Li from Nanjing University and Professor Zhengyi Shao from the Shanghai Astronomical Observatory. To promote community reproduction and application, the MiMO code and related datasets have been released through the National Astronomical Data Center (NADC).
Paper:https://iopscience.iop.org/article/10.3847/1538-4357/ae17ce
Code:https://nadc.china-vo.org/res/r101693/
Contact:Lu Li: lilu@shao.ac.cn
Download attachments: