Computational Material Science Has a Data Problem.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

We present an overdue questioning of the computational material science data: Is it suitable for training machine learning models? By examining the energy above the convex hull (), the electronic bandgap, and the formation energy data in the Materials Project dataset, we find that is an unsteady quantity, as are DFT-computed voltages, because the present materials in the database do not constitute sufficient representation of the chemical spaces that are necessary to account for crystal decomposition. We also show that there are discrepancies in the reported electronic bandgap values in the Materials Project database and that the formation energy data can potentially shift due to changing the optimization parameters that reduce the energy of the structure below the value deposited in the database.

Authors

  • Sherif Abdulkader Tawfik
    Applied Artificial Intelligence Institute, Deakin University, Geelong, Victoria 3216, Australia.