AIMC Topic: Benchmarking

Clear Filters Showing 31 to 40 of 462 articles

Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.

Eye (London, England)
BACKGROUND/OBJECTIVE: This study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by various Large Language Models (LLMs) (ChatGPT-3.5, Gemini, Claude 3, and GPT-4.0) in the clinical context of uveitis, utiliz...

Unmasking the chameleons: A benchmark for out-of-distribution detection in medical tabular data.

International journal of medical informatics
BACKGROUND: Machine Learning (ML) models often struggle to generalize effectively to data that deviates from the training distribution. This raises significant concerns about the reliability of real-world healthcare systems encountering such inputs k...

MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities.

Scientific data
MedSegBench is a comprehensive benchmark designed to evaluate deep learning models for medical image segmentation across a wide range of modalities. It covers a wide range of modalities, including 35 datasets with over 60,000 images from ultrasound, ...

MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction.

Journal of biomedical informatics
OBJECTIVE: Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, ...

A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.

Scientific data
Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from...

Data-centric challenges with the application and adoption of artificial intelligence for drug discovery.

Expert opinion on drug discovery
INTRODUCTION: Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models.

Benchmarking deep learning-based low-dose CT image denoising algorithms.

Medical physics
BACKGROUND: Long-lasting efforts have been made to reduce radiation dose and thus the potential radiation risk to the patient for computed tomography (CT) acquisitions without severe deterioration of image quality. To this end, various techniques hav...

Longitudinal deep neural networks for assessing metastatic brain cancer on a large open benchmark.

Nature communications
The detection and tracking of metastatic cancer over the lifetime of a patient remains a major challenge in clinical trials and real-world care. Advances in deep learning combined with massive datasets may enable the development of tools that can add...

Benchmarking Human-AI collaboration for common evidence appraisal tools.

Journal of clinical epidemiology
BACKGROUND AND OBJECTIVE: It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in apprais...

radMLBench: A dataset collection for benchmarking in radiomics.

Computers in biology and medicine
BACKGROUND: New machine learning methods and techniques are frequently introduced in radiomics, but they are often tested on a single dataset, which makes it challenging to assess their true benefit. Currently, there is a lack of a larger, publicly a...