Cycle-configuration descriptors: a novel graph-theoretic approach to enhancing molecular inference.

Journal: Journal of cheminformatics
Published Date:

Abstract

Inference of molecules with desired activities/properties is one of the key and challenging issues in cheminformatics and bioinformatics. For that purpose, our research group has recently developed a state-of-the-art framework mol-infer for molecular inference. This framework first constructs a prediction function for a fixed property using machine learning models, which is then simulated by mixed-integer linear programming to infer desired molecules. The accuracy of the framework heavily relies on the representation power of the descriptors. In this study, we highlight a typical class of non-isomorphic chemical graphs with reasonably different property values that cannot be distinguished by the standard "two-layered (2L) model" of mol-infer. To address this distinguishability problem of the 2L model, we propose a novel family of descriptors, named cycle-configuration (CC), which captures the notion of ortho/meta/para patterns that appear in aromatic rings, which was impossible in the framework so far. Extensive computational experiments show that with the new descriptors, we can construct prediction functions with similar or better performance for all 44 tested chemical properties, including 27 regression datasets and 17 classification datasets comparing with our previous studies, confirming the effectiveness of the CC descriptors. For inference, we also provide a system of linear constraints to formulate the CC descriptors as linear constraints. We demonstrate that a chemical graph with up to 50 non-hydrogen vertices can be inferred within a practical time frame.

Authors

  • Bowen Song
    Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK.
  • Jianshen Zhu
    Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
  • Naveed Ahmed Azam
    Department of Mathematics, Quaid-i-Azam University, Islamabad, Pakistan.
  • Kazuya Haraguchi
    Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
  • Liang Zhao
    Graduate School of Advanced Integrated Studies in Human Survivability (Shishu-Kan), Kyoto University, Kyoto, Japan.
  • Tatsuya Akutsu
    Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan.

Keywords

No keywords available for this article.