Generative AI can now synthesize strikingly realistic images from text, yet
output quality remains highly sensitive to how prompts are phrased. Direct
Preference Optimization (DPO) offers a lightweight, off-policy alternative to
RL for automatic pr... read more
Effective communication between directors and cinematographers is fundamental
in film production, yet traditional approaches relying on visual references and
hand-drawn storyboards often lack the efficiency and precision necessary during
pre-produc... read more
Background: The rational identification of essential genes is a cornerstone
of drug discovery, yet standard computational methods like Flux Balance
Analysis (FBA) often struggle to produce accurate predictions in complex,
redundant metabolic networ... read more
Visible-infrared object detection aims to enhance the detection robustness by
exploiting the complementary information of visible and infrared image pairs.
However, its performance is often limited by frequent misalignments caused by
resolution dis... read more
Multilingual vision-language models have made significant strides in image
captioning, yet they still lag behind their English counterparts due to limited
multilingual training data and costly large-scale model parameterization.
Retrieval-augmented... read more
Multimodal large language models (MLLMs) have made remarkable strides,
largely driven by their ability to process increasingly long and complex
contexts, such as high-resolution images, extended video sequences, and lengthy
audio input. While this ... read more
Pansharpening aims to fuse high-resolution panchromatic (PAN) images with
low-resolution multispectral (LRMS) images to generate high-resolution
multispectral (HRMS) images. Although deep learning-based methods have achieved
promising performance, ... read more
Integrating large language models (LLMs) into autonomous driving motion
planning has recently emerged as a promising direction, offering enhanced
interpretability, better controllability, and improved generalization in rare
and long-tail scenarios.... read more
Inserting 3D objects into videos is a longstanding challenge in computer
graphics with applications in augmented reality, virtual try-on, and video
composition. Achieving both temporal consistency, or realistic lighting remains
difficult, particula... read more
3D Gaussian Splatting (GS) has emerged as a powerful representation for
high-quality scene reconstruction, offering compelling rendering quality.
However, the training process of GS often suffers from slow convergence due to
inefficient densificati... read more
Join thousands of healthcare professionals staying informed about the latest AI breakthroughs in medicine. Get curated insights delivered to your inbox.