M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

Journal: arXiv

Published Date: Feb 26, 2025

Abstract

We present M2-omni, a cutting-edge, open-source omni-MLLM that achieves competitive performance to GPT-4o. M2-omni employs a unified multimodal sequence modeling framework, which empowers Large Language Models(LLMs) to acquire comprehensive cross-modal understanding and generation capabilities. Specifically, M2-omni can process arbitrary combinations of audio, video, image, and text modalities as input, generating multimodal sequences interleaving with audio, image, or text outputs, thereby enabling an advanced and interactive real-time experience. The training of such an omni-MLLM is challenged by significant disparities in data quantity and convergence rates across modalities. To address these challenges, we propose a step balance strategy during pre-training to handle the quantity disparities in modality-specific data. Additionally, a dynamically adaptive balance strategy is introduced during the instruction tuning stage to synchronize the modality-wise training progress, ensuring optimal convergence. Notably, we prioritize preserving strong performance on pure text tasks to maintain the robustness of M2-omni's language understanding capability throughout the training process. To our best knowledge, M2-omni is currently a very competitive open-source model to GPT-4o, characterized by its comprehensive modality and task support, as well as its exceptional performance. We expect M2-omni will advance the development of omni-MLLMs, thus facilitating future research in this domain.

Authors

Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
Sirui Gao
Xuzheng Yu
Yunxiao Sun
Tai-Wei Chang
Jingdong Chen
Ming Yang
Jun Zhou

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2502.18778v3)

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals