Machine learning to parse breast pathology reports in Chinese.

Journal: Breast cancer research and treatment
Published Date:

Abstract

INTRODUCTION: Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages.

Authors

  • Rong Tang
    Division of Surgical Oncology, MGH, Boston, USA.
  • Lizhi Ouyang
    Department of Breast Surgery, Hunan Cancer Hospital, Changsha, Hunan, China.
  • Clara Li
    Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, USA.
  • Yue He
    Department of Breast Surgery, Hunan Cancer Hospital, Changsha, Hunan, China.
  • Molly Griffin
    Division of Surgical Oncology, MGH, Boston, USA. megriff@post.harvard.edu.
  • Alphonse Taghian
    Department of Radiation Oncology, MGH, Boston, USA.
  • Barbara Smith
    Division of Surgical Oncology, MGH, Boston, USA.
  • Adam Yala
    Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, USA.
  • Regina Barzilay
    Computer Science and Artificial Intelligence Laboratory , Massachusetts Institute of Technology , 77 Massachusetts Avenue , Cambridge , MA 02139 , USA . Email: regina@csail.mit.edu.
  • Kevin Hughes
    Division of Surgical Oncology, MGH, Boston, USA.