Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVE: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance.

Authors

  • Simon Kocbek
    Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, Australia; School of Science, RMIT University, Melbourne, Australia; Department of Computing and Information Systems, University of Melbourne, Melbourne, Australia. Electronic address: skocbek@gmail.com.
  • Lawrence Cavedon
    School of Science, RMIT University, Melbourne, Australia.
  • David Martinez
    The University of Melbourne, Australia.
  • Christopher Bain
    Mercy Health, Heidelberg, Australia; Faculty of Information Technology, Monash University, Clayton, Australia.
  • Chris Mac Manus
    Health Informatics Department, Alfred Hospital, Melbourne, Australia; Now with OzeScribe, Melbourne, Australia.
  • Gholamreza Haffari
    Faculty of Information Technology, Monash University, Clayton, Australia.
  • Ingrid Zukerman
    Faculty of Information Technology, Monash University, Clayton, Australia.
  • Karin Verspoor
    Dept of Computing and Information Systems, School of Engineering, University of Melbourne, Melbourne, Australia.