AIMC Topic: Benchmarking

Clear Filters Showing 1 to 10 of 462 articles

Dissecting HealthBench: Disease Spectrum, Clinical Diversity, and Data Insights from Multi-Turn Clinical AI Evaluation Benchmark.

Journal of medical systems
HealthBench is an open-source, large-scale benchmark consisting of 5,000 multi-turn clinical conversations evaluated against 48,562 criteria developed by clinicians. Recognized as a significant advancement in assessing realistic artificial intelligen...

Benchmarking 3D Structure-Based Molecule Generators.

Journal of chemical information and modeling
To understand the benefits and drawbacks of 3D combinatorial and deep learning generators, a novel benchmark was created focusing on the recreation of important protein-ligand interactions and 3D ligand conformations. Using the BindingMOAD data set w...

Comprehensive protein datasets and benchmarking for liquid-liquid phase separation studies.

Genome biology
BACKGROUND: Proteins self-organize in dynamic cellular environments by assembling into reversible biomolecular condensates through liquid-liquid phase separation (LLPS). These condensates can comprise single or multiple proteins, with different roles...

Towards fair decentralized benchmarking of healthcare AI algorithms with the Federated Tumor Segmentation (FeTS) challenge.

Nature communications
Computational competitions are the standard for benchmarking medical image analysis algorithms, but they typically use small curated test datasets acquired at a few centers, leaving a gap to the reality of diverse multicentric patient data. To this e...

A publicly available benchmark for assessing large language models' ability to predict how humans balance self-interest and the interest of others.

Scientific reports
Large language models (LLMs) hold enormous potential to assist humans in decision-making processes, from everyday to high-stake scenarios. However, as many human decisions carry social implications, for LLMs to be reliable assistants a necessary prer...

Quantitative benchmarking of nuclear segmentation algorithms in multiplexed immunofluorescence imaging for translational studies.

Communications biology
Multiplexed imaging techniques require identifying different cell types in the tissue. To utilize their potential for cellular and molecular analysis, high throughput and accurate analytical approaches are needed in parsing vast amounts of data, part...

Development of a data-driven urban immunity assessment model: providing a new benchmark for urban governance under public health emergencies.

Frontiers in public health
Public health emergencies (PHEs) pose significant challenges to global urban governance systems, necessitating the establishment of more efficient and dynamically adaptive response mechanisms. Numerous cases indicate that current urban governance sti...

A Benchmark for Virus Infection Reporter Virtual Staining in Fluorescence and Brightfield Microscopy.

Scientific data
Detecting virus-infected cells in light microscopy requires a reporter signal commonly achieved by immunohistochemistry or genetic engineering. While classification-based machine learning approaches to the detection of virus-infected cells have been ...

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

JMIR medical informatics
BACKGROUND: The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.