AIMC Topic: Benchmarking

Clear Filters Showing 1 to 10 of 452 articles

A publicly available benchmark for assessing large language models' ability to predict how humans balance self-interest and the interest of others.

Scientific reports
Large language models (LLMs) hold enormous potential to assist humans in decision-making processes, from everyday to high-stake scenarios. However, as many human decisions carry social implications, for LLMs to be reliable assistants a necessary prer...

Quantitative benchmarking of nuclear segmentation algorithms in multiplexed immunofluorescence imaging for translational studies.

Communications biology
Multiplexed imaging techniques require identifying different cell types in the tissue. To utilize their potential for cellular and molecular analysis, high throughput and accurate analytical approaches are needed in parsing vast amounts of data, part...

Development of a data-driven urban immunity assessment model: providing a new benchmark for urban governance under public health emergencies.

Frontiers in public health
Public health emergencies (PHEs) pose significant challenges to global urban governance systems, necessitating the establishment of more efficient and dynamically adaptive response mechanisms. Numerous cases indicate that current urban governance sti...

A Benchmark for Virus Infection Reporter Virtual Staining in Fluorescence and Brightfield Microscopy.

Scientific data
Detecting virus-infected cells in light microscopy requires a reporter signal commonly achieved by immunohistochemistry or genetic engineering. While classification-based machine learning approaches to the detection of virus-infected cells have been ...

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

JMIR medical informatics
BACKGROUND: The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.

Enhancing clinical decision support with physiological waveforms - A multimodal benchmark in emergency care.

Computers in biology and medicine
BACKGROUND: AI-driven prediction algorithms have the potential to enhance emergency medicine by enabling rapid and accurate decision-making regarding patient status and potential deterioration. However, the integration of multimodal data, including r...

Benchmarking reinforcement learning algorithms for autonomous mechanical thrombectomy.

International journal of computer assisted radiology and surgery
PURPOSE: Mechanical thrombectomy (MT) is the gold standard for treating acute ischemic stroke. However, challenges such as operator radiation exposure, reliance on operator experience, and limited treatment access remain. Although autonomous robotics...

A benchmarking framework and dataset for learning to defer in human-AI decision-making.

Scientific data
Learning to Defer (L2D) algorithms improve human-AI collaboration by deferring decisions to human experts when they are likely to be more accurate than the AI model. These can be crucial in high-stakes tasks like fraud detection, where false negative...

Arch-Eval benchmark for assessing chinese architectural domain knowledge in large language models.

Scientific reports
The burgeoning application of Large Language Models (LLMs) in Natural Language Processing (NLP) has prompted scrutiny of their domain-specific knowledge processing, especially in the construction industry. Despite high demand, there is a scarcity of ...

A clinical benchmark of public self-supervised pathology foundation models.

Nature communications
The use of self-supervised learning to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This ...