Improving Data and Prediction Quality of High-Throughput Perovskite Synthesis with Model Fusion.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Combinatorial fusion analysis (CFA) is an approach for combining multiple scoring systems using the rank-score characteristic function and cognitive diversity measure. One example is to combine diverse machine learning models to achieve better prediction quality. In this work, we apply CFA to the synthesis of metal halide perovskites containing organic ammonium cations via inverse temperature crystallization. Using a data set generated by high-throughput experimentation, four individual models (support vector machines, random forests, weighted logistic classifier, and gradient boosted trees) were developed. We characterize each of these scoring systems and explore 66 possible combinations of the models. When measured by the precision on predicting crystal formation, the majority of the combination models improves the individual model results. The best combination models outperform the best individual models by 3.9 percentage points in precision. In addition to improving prediction quality, we demonstrate how the fusion models can be used to identify mislabeled input data and address issues of data quality. In particular, we identify example cases where all single models and all fusion models do not give the correct prediction. Experimental replication of these syntheses reveals that these compositions are sensitive to modest temperature variations across the different locations of the heating element that can hinder or enhance the crystallization process. In summary, we demonstrate that model fusion using CFA can not only identify a previously unconsidered influence on reaction outcome but also be used as a form of quality control for high-throughput experimentation.

Authors

  • Yuanqing Tang
    Laboratory of Informatics and Data Mining (LIDM), Department of Computer and Information Science, Fordham University, 113 West 60th Street, New York, New York 10023, United States.
  • Zhi Li
    Department of Nursing, Zhongshan Hospital of Traditional Chinese Medicine Affiliated to Guangzhou University of Traditional Chinese Medicine, Zhongshan, China.
  • Mansoor Ani Najeeb Nellikkal
    Department of Chemistry, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, United States.
  • Hamed Eramian
    Netrias LLC, 3100 Clarendon Boulevard, Suite 200, Arlington, Virginia 22201, United States.
  • Emory M Chan
    Molecular Foundry, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, United States.
  • Alexander J Norquist
    Department of Chemistry, Haverford College, Haverford, PA, USA. anorquis@haverford.edu.
  • D Frank Hsu
    Laboratory of Informatics and Data Mining (LIDM), Department of Computer and Information Science, Fordham University, 113 West 60th Street, New York, New York 10023, United States.
  • Joshua Schrier
    Department of Chemistry, Fordham University The Bronx NY 10458 USA.