Efficient Semantic Similarity Computing with Optimized BERT Models.
Journal:
Studies in health technology and informatics
Published Date:
May 15, 2025
Abstract
Bridging diverse terminologies and ensuring precise information retrieval, semantic similarity in medical language is key to improve healthcare outcomes. Semantic similarity measures how closely pieces of text share the same meaning, a crucial element in Natural Language Processing (NLP) for better understanding and interpreting data. While Large Language Models (LLMs) are known for their versatility and text generation abilities, BERT (Bidirectional Encoder Representations from Transformers) excels at text analysis and identifying semantic similarities. However, as AI models become more advanced, researchers face challenges related to model size, computational demands, and deployment constraints (e.g., energy, memory, and latency). To address these issues, model optimization techniques can drastically reduce memory usage and speed up inference. In this work, we leverage the open-source Microsoft Olive tool to find the best optimizations, then apply a dynamic quantization process. We evaluate our approach on the DEFT 2020 Text Mining Challenge, slightly improving performance metrics while achieving a 20x average speed-up and reducing memory usage by around 70%.