Identifying genotype-phenotype relationships in biomedical text.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set.

Authors

  • Maryam Khordad
    Department of Computer Science, University of Western Ontario, 1151 Richmond Street, London, N6A 5B7, Canada. mkhordad@alumni.uwo.ca.
  • Robert E Mercer
    Department of Computer Science, University of Western Ontario, 1151 Richmond Street, London, N6A 5B7, Canada.