Benchmarking - AI Medical Compendium

Dissecting HealthBench: Disease Spectrum, Clinical Diversity, and Data Insights from Multi-Turn Clinical AI Evaluation Benchmark.

Journal of medical systems Jul 28, 2025

HealthBench is an open-source, large-scale benchmark consisting of 5,000 multi-turn clinical conversations evaluated against 48,562 criteria developed by clinicians. Recognized as a significant advancement in assessing realistic artificial intelligen...

Benchmarking Humans Artificial Intelligence

View on PubMed DOI

Benchmarking 3D Structure-Based Molecule Generators.

Journal of chemical information and modeling Jul 25, 2025

To understand the benefits and drawbacks of 3D combinatorial and deep learning generators, a novel benchmark was created focusing on the recreation of important protein-ligand interactions and 3D ligand conformations. Using the BindingMOAD data set w...

Deep Learning Benchmarking Ligands Molecular Conformation Proteins Models, Molecular

View on PubMed DOI

Artificial intelligence in coronary angiography: benchmarking the diagnostic accuracy of ChatGPT-4o against interventional cardiologists.

Open heart Jul 20, 2025

BACKGROUND: The integration of artificial intelligence (AI) into medical diagnostics has significantly impacted cardiology by enhancing diagnostic precision and therapeutic strategies. Coronary artery disease continues to be a leading cause of global...

Humans Artificial Intelligence Reproducibility of Results Aged Male Retrospective Studies Female Radiographic Image Interpretation, Computer-Assisted Coronary Artery Disease Coronary Vessels Cardiologists Benchmarking Predictive Value of Tests Coronary Angiography Middle Aged Generative Artificial Intelligence

View on PubMed DOI

Comprehensive protein datasets and benchmarking for liquid-liquid phase separation studies.

Genome biology Jul 8, 2025

BACKGROUND: Proteins self-organize in dynamic cellular environments by assembling into reversible biomolecular condensates through liquid-liquid phase separation (LLPS). These condensates can comprise single or multiple proteins, with different roles...

Phase Separation Benchmarking Biomolecular Condensates Proteins Databases, Protein Algorithms

View on PubMed DOI

Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge.

Nature communications Jul 8, 2025

Computational competitions are the standard for benchmarking medical image analysis algorithms, but they typically use small curated test datasets acquired at a few centers, leaving a gap to the reality of diverse multicentric patient data. To this e...

Magnetic Resonance Imaging Benchmarking Artificial Intelligence Algorithms Brain Neoplasms Image Processing, Computer-Assisted Humans

View on PubMed DOI

A publicly available benchmark for assessing large language models' ability to predict how humans balance self-interest and the interest of others.

Scientific reports Jul 1, 2025

Large language models (LLMs) hold enormous potential to assist humans in decision-making processes, from everyday to high-stake scenarios. However, as many human decisions carry social implications, for LLMs to be reliable assistants a necessary prer...

Humans Language Benchmarking Large Language Models Male Decision Making Female Adult

View on PubMed DOI

Quantitative benchmarking of nuclear segmentation algorithms in multiplexed immunofluorescence imaging for translational studies.

Communications biology May 30, 2025

Multiplexed imaging techniques require identifying different cell types in the tissue. To utilize their potential for cellular and molecular analysis, high throughput and accurate analytical approaches are needed in parsing vast amounts of data, part...

Cell Nucleus Fluorescent Antibody Technique Benchmarking Translational Research, Biomedical Algorithms Image Processing, Computer-Assisted Humans

View on PubMed DOI

Development of a data-driven urban immunity assessment model: providing a new benchmark for urban governance under public health emergencies.

Frontiers in public health May 29, 2025

Public health emergencies (PHEs) pose significant challenges to global urban governance systems, necessitating the establishment of more efficient and dynamically adaptive response mechanisms. Numerous cases indicate that current urban governance sti...

Public Health Benchmarking Humans Emergencies

View on PubMed DOI

A Benchmark for Virus Infection Reporter Virtual Staining in Fluorescence and Brightfield Microscopy.

Scientific data May 28, 2025

Detecting virus-infected cells in light microscopy requires a reporter signal commonly achieved by immunohistochemistry or genetic engineering. While classification-based machine learning approaches to the detection of virus-infected cells have been ...

Microscopy Viruses Microscopy, Fluorescence Benchmarking Humans Machine Learning Staining and Labeling

View on PubMed DOI

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

JMIR medical informatics May 16, 2025

BACKGROUND: The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.

Large Language Models Surveys and Questionnaires Benchmarking Humans Cross-Sectional Studies

View on PubMed DOI

AIMC Topic: Benchmarking

Dissecting HealthBench: Disease Spectrum, Clinical Diversity, and Data Insights from Multi-Turn Clinical AI Evaluation Benchmark.

Benchmarking 3D Structure-Based Molecule Generators.

Artificial intelligence in coronary angiography: benchmarking the diagnostic accuracy of ChatGPT-4o against interventional cardiologists.

Comprehensive protein datasets and benchmarking for liquid-liquid phase separation studies.

Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge.

A publicly available benchmark for assessing large language models' ability to predict how humans balance self-interest and the interest of others.

Quantitative benchmarking of nuclear segmentation algorithms in multiplexed immunofluorescence imaging for translational studies.

Development of a data-driven urban immunity assessment model: providing a new benchmark for urban governance under public health emergencies.

A Benchmark for Virus Infection Reporter Virtual Staining in Fluorescence and Brightfield Microscopy.

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

Popular Topics

Recent Journals

AIMC Topic: Benchmarking

Stay Ahead of Medical AI

Popular Topics

Recent Journals