Benchmarking - AI Medical Compendium

Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.

Eye (London, England) Dec 17, 2024

BACKGROUND/OBJECTIVE: This study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by various Large Language Models (LLMs) (ChatGPT-3.5, Gemini, Claude 3, and GPT-4.0) in the clinical context of uveitis, utiliz...

Benchmarking Comprehension Large Language Models Generative Artificial Intelligence Humans Uveitis Language

View on PubMed DOI

Unmasking the chameleons: A benchmark for out-of-distribution detection in medical tabular data.

International journal of medical informatics Dec 17, 2024

BACKGROUND: Machine Learning (ML) models often struggle to generalize effectively to data that deviates from the training distribution. This raises significant concerns about the reliability of real-world healthcare systems encountering such inputs k...

Algorithms Humans Machine Learning Benchmarking

View on PubMed DOI

MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities.

Scientific data Nov 25, 2024

MedSegBench is a comprehensive benchmark designed to evaluate deep learning models for medical image segmentation across a wide range of modalities. It covers a wide range of modalities, including 35 datasets with over 60,000 images from ultrasound, ...

Ultrasonography Image Processing, Computer-Assisted Magnetic Resonance Imaging Algorithms Deep Learning Diagnostic Imaging Benchmarking Humans

View on PubMed DOI

MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction.

Journal of biomedical informatics Nov 12, 2024

OBJECTIVE: Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, ...

Humans Machine Learning Algorithms Benchmarking Drug-Related Side Effects and Adverse Reactions Social Media Databases, Factual Adverse Drug Reaction Reporting Systems Data Mining Natural Language Processing Electronic Health Records

View on PubMed DOI

A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.

Scientific data Nov 8, 2024

Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from...

Peptides Proteomics Humans Mass Spectrometry Machine Learning Animals Benchmarking

View on PubMed DOI

Data-centric challenges with the application and adoption of artificial intelligence for drug discovery.

Expert opinion on drug discovery Sep 24, 2024

INTRODUCTION: Artificial intelligence (AI) is exhibiting tremendous potential to reduce the massive costs and long timescales of drug discovery. There are however important challenges currently limiting the impact and scope of AI models.

Drug Development Humans Uncertainty Bias Artificial Intelligence Benchmarking Drug Discovery Models, Theoretical

View on PubMed DOI

Benchmarking deep learning-based low-dose CT image denoising algorithms.

Medical physics Sep 17, 2024

BACKGROUND: Long-lasting efforts have been made to reduce radiation dose and thus the potential radiation risk to the patient for computed tomography (CT) acquisitions without severe deterioration of image quality. To this end, various techniques hav...

Algorithms Tomography, X-Ray Computed Humans Benchmarking Radiation Dosage Deep Learning Signal-To-Noise Ratio Image Processing, Computer-Assisted

View on PubMed DOI

Longitudinal deep neural networks for assessing metastatic brain cancer on a large open benchmark.

Nature communications Sep 17, 2024

The detection and tracking of metastatic cancer over the lifetime of a patient remains a major challenge in clinical trials and real-world care. Advances in deep learning combined with massive datasets may enable the development of tools that can add...

Longitudinal Studies Benchmarking Middle Aged Deep Learning Brain Neoplasms Male Neural Networks, Computer Female Aged Humans

View on PubMed DOI

Benchmarking Human-AI collaboration for common evidence appraisal tools.

Journal of clinical epidemiology Sep 12, 2024

BACKGROUND AND OBJECTIVE: It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in apprais...

Benchmarking Evidence-Based Medicine Humans Artificial Intelligence Consensus Research Design Systematic Reviews as Topic

View on PubMed DOI

radMLBench: A dataset collection for benchmarking in radiomics.

Computers in biology and medicine Sep 12, 2024

BACKGROUND: New machine learning methods and techniques are frequently introduced in radiomics, but they are often tested on a single dataset, which makes it challenging to assess their true benefit. Currently, there is a lack of a larger, publicly a...

Databases, Factual Radiomics Benchmarking Image Processing, Computer-Assisted Humans Machine Learning

View on PubMed DOI

AIMC Topic: Benchmarking

Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.

Unmasking the chameleons: A benchmark for out-of-distribution detection in medical tabular data.

MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities.

MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction.

A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.

Data-centric challenges with the application and adoption of artificial intelligence for drug discovery.

Benchmarking deep learning-based low-dose CT image denoising algorithms.

Longitudinal deep neural networks for assessing metastatic brain cancer on a large open benchmark.

Benchmarking Human-AI collaboration for common evidence appraisal tools.

radMLBench: A dataset collection for benchmarking in radiomics.

Popular Topics

Recent Journals

AIMC Topic: Benchmarking

Stay Ahead of Medical AI

Popular Topics

Recent Journals