Active Learning of Bayesian Linear Models with High-Dimensional Binary Features by Parameter Confidence-Region Estimation.

Journal: Neural computation
PMID:

Abstract

In this letter, we study an active learning problem for maximizing an unknown linear function with high-dimensional binary features. This problem is notoriously complex but arises in many important contexts. When the sampling budget, that is, the number of possible function evaluations, is smaller than the number of dimensions, it tends to be impossible to identify all of the optimal binary features. Therefore, in practice, only a small number of such features are considered, with the majority kept fixed at certain default values, which we call the . The main contribution of this letter is to formally study the working set heuristic and present a suite of theoretically robust algorithms for more efficient use of the sampling budget. Technically, we introduce a novel method for estimating the confidence regions of model parameters that is tailored to active learning with high-dimensional binary features. We provide a rigorous theoretical analysis of these algorithms and prove that a commonly used working set heuristic can identify optimal binary features with favorable sample complexity. We explore the performance of the proposed approach through numerical simulations and an application to a functional protein design problem.

Authors

  • Yu Inatsu
    RIKEN Center for Advanced Intelligent Project, Chuo-ku, Tokyo, 103-0027, Japan yu.inatsu@riken.jp.
  • Masayuki Karasuyama
    Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan; JST, PRESTO, Kawaguchi, Saitama, 332-0012, Japan; and Center for Materials Research by Information Integration, National Institute for Material Science, Sengen, Tsukuba, Ibaraki, 305-0047, Japan karasuyama@nitech.ac.jp.
  • Keiichi Inoue
    Institute for Solid State Physics, University of Tokyo, Kashiwa, Chiba, 277-8561, Japan inoue@issp.u-tokyo.ac.jp.
  • Hideki Kandori
    Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan kandori@nitech.ac.jp.
  • Ichiro Takeuchi
    RIKEN Center for Advanced Intelligent Project, Chuo-ku, Tokyo, 103-0027, Japan; Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan; and Center for Materials Research by Information Integration, National Institute for Material Science, Sengen, Tsukuba, Ibaraki, 305-0047, Japan takeuchi.ichiro@nitech.ac.jp.