AI Medical Compendium

Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

arXiv Jul 27, 2025

Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic pr... read more

Computation and Language Artificial Intelligence Machine Learning

CineVision: An Interactive Pre-visualization Storyboard System for Director-Cinematographer Collaboration

arXiv Jul 27, 2025

Effective communication between directors and cinematographers is fundamental in film production, yet traditional approaches relying on visual references and hand-drawn storyboards often lack the efficiency and precision necessary during pre-produc... read more

Human-Computer Interaction

A Topology-Based Machine Learning Model Decisively Outperforms Flux Balance Analysis in Predicting Metabolic Gene Essentiality

arXiv Jul 27, 2025

Background: The rational identification of essential genes is a cornerstone of drug discovery, yet standard computational methods like Flux Balance Analysis (FBA) often struggle to produce accurate predictions in complex, redundant metabolic networ... read more

Molecular Networks

Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection

arXiv Jul 27, 2025

Visible-infrared object detection aims to enhance the detection robustness by exploiting the complementary information of visible and infrared image pairs. However, its performance is often limited by frequent misalignments caused by resolution dis... read more

Computer Vision and Pattern Recognition

CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning

arXiv Jul 27, 2025

Multilingual vision-language models have made significant strides in image captioning, yet they still lag behind their English counterparts due to limited multilingual training data and costly large-scale model parameterization. Retrieval-augmented... read more

Computation and Language

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

arXiv Jul 27, 2025

Multimodal large language models (MLLMs) have made remarkable strides, largely driven by their ability to process increasingly long and complex contexts, such as high-resolution images, extended video sequences, and lengthy audio input. While this ... read more

Computer Vision and Pattern Recognition

SWIFT: A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening

arXiv Jul 27, 2025

Pansharpening aims to fuse high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate high-resolution multispectral (HRMS) images. Although deep learning-based methods have achieved promising performance, ... read more

Computer Vision and Pattern Recognition

VLMPlanner: Integrating Visual Language Models with Motion Planning

arXiv Jul 27, 2025

Integrating large language models (LLMs) into autonomous driving motion planning has recently emerged as a promising direction, offering enhanced interpretability, better controllability, and improved generalization in rare and long-tail scenarios.... read more

Artificial Intelligence Robotics

From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos

arXiv Jul 27, 2025

Inserting 3D objects into videos is a longstanding challenge in computer graphics with applications in augmented reality, virtual try-on, and video composition. Achieving both temporal consistency, or realistic lighting remains difficult, particula... read more

Computer Vision and Pattern Recognition

Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction

arXiv Jul 27, 2025

3D Gaussian Splatting (GS) has emerged as a powerful representation for high-quality scene reconstruction, offering compelling rendering quality. However, the training process of GS often suffers from slow convergence due to inefficient densificati... read more

Computer Vision and Pattern Recognition

Artificial Intelligence Medical Compendium

Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

CineVision: An Interactive Pre-visualization Storyboard System for Director-Cinematographer Collaboration

A Topology-Based Machine Learning Model Decisively Outperforms Flux Balance Analysis in Predicting Metabolic Gene Essentiality

Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection

CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

SWIFT: A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening

VLMPlanner: Integrating Visual Language Models with Motion Planning

From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos

Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction

Popular Topics

Recent Journals

Artificial Intelligence Medical Compendium

Stay Ahead of Medical AI

Popular Topics

Recent Journals