PDF text classification to leverage information extraction from publication reports.

Journal: Journal of biomedical informatics

Published Date: Apr 1, 2016

Abstract

OBJECTIVES: Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems.

Authors

Duy Duc An Bui
Guilherme Del Fiol

Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States.
Siddhartha Jonnalagadda

Department of Preventive Medicine-Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA.

Keywords

Algorithms Humans Information Storage and Retrieval Machine Learning Narration Natural Language Processing Publications Review Literature as Topic

External Resources

View on PubMed Access via DOI PubMed (27044929)

PDF text classification to leverage information extraction from publication reports.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals