Molecular Design Based on Integer Programming and Splitting Data Sets by Hyperplanes.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

A novel framework for designing the molecular structure of chemical compounds with a desired chemical property has recently been proposed. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of two functions: a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. To improve the learning performance of prediction functions in the framework, we design a method that splits a given data set C into two subsets C,i=1,2 by a hyperplane in a chemical space so that most compounds in the first (resp., second) subset have observed values lower (resp., higher) than a threshold θ. We construct a prediction function ψ to the data set C by combining prediction functions ψ,i=1,2 each of which is constructed on C independently. The results of our computational experiments suggest that the proposed method improved the learning performance for several chemical properties to which a good prediction function has been difficult to construct.

Authors

  • Jianshen Zhu
    Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
  • Naveed Ahmed Azam
    Department of Mathematics, Quaid-i-Azam University, Islamabad, Pakistan.
  • Kazuya Haraguchi
    Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
  • Liang Zhao
    Graduate School of Advanced Integrated Studies in Human Survivability (Shishu-Kan), Kyoto University, Kyoto, Japan.
  • Hiroshi Nagamochi
    Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Japan.
  • Tatsuya Akutsu
    Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan.