Reduction of Supervision for Biomedical Knowledge Discovery
Journal:
arXiv
Published Date:
Apr 13, 2025
Abstract
Knowledge discovery is hindered by the increasing volume of publications and
the scarcity of extensive annotated data. To tackle the challenge of
information overload, it is essential to employ automated methods for knowledge
extraction and processing. Finding the right balance between the level of
supervision and the effectiveness of models poses a significant challenge.
While supervised techniques generally result in better performance, they have
the major drawback of demanding labeled data. This requirement is
labor-intensive and time-consuming and hinders scalability when exploring new
domains. In this context, our study addresses the challenge of identifying
semantic relationships between biomedical entities (e.g., diseases, proteins)
in unstructured text while minimizing dependency on supervision. We introduce a
suite of unsupervised algorithms based on dependency trees and attention
mechanisms and employ a range of pointwise binary classification methods.
Transitioning from weakly supervised to fully unsupervised settings, we assess
the methods' ability to learn from data with noisy labels. The evaluation on
biomedical benchmark datasets explores the effectiveness of the methods. Our
approach tackles a central issue in knowledge discovery: balancing performance
with minimal supervision. By gradually decreasing supervision, we assess the
robustness of pointwise binary classification techniques in handling noisy
labels, revealing their capability to shift from weakly supervised to entirely
unsupervised scenarios. Comprehensive benchmarking offers insights into the
effectiveness of these techniques, suggesting an encouraging direction toward
adaptable knowledge discovery systems, representing progress in creating
data-efficient methodologies for extracting useful insights when annotated data
is limited.