Information extraction (IE), the distillation of specific information from unstructured data, is a core task in natural language processing. For rare entities (<1% prevalence), collection of positive examples required to train a model may require an ...
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
34457147
Rare diseases affect between 25 and 30 million people in the United States, and understanding their epidemiology is critical to focusing research efforts. However, little is known about the prevalence of many rare diseases. Given a lack of automated ...
PURPOSE: Developing large-scale datasets with research-quality annotations is challenging due to the high cost of refining clinically generated markup into high precision annotations. We evaluated the direct use of a large dataset with only clinicall...
The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical scie...
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over ...
The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the community model organism knowledgebase for the laboratory mouse, a widely used animal model for comparative studies of the genetic and genomic basis for human health and disease. ...
OBJECTIVES: Diagnostic accuracy of artificial intelligence (AI) pneumothorax (PTX) detection in chest radiographs (CXR) is limited by the noisy annotation quality of public training data and confounding thoracic tubes (TT). We hypothesize that in-ima...
In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical ...
Journal of the American Medical Informatics Association : JAMIA
33355353
OBJECTIVE: Sentiment analysis is a popular tool for analyzing health-related social media content. However, existing studies exhibit numerous methodological issues and inconsistencies with respect to research design and results reporting, which could...
Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised mod...