CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
Published in ICML 2026, 2026
Recommended citation: Yinghao Ma, Haiwen Xia, Hewei Gao, Weixiong Chen, Yuxin Ye, Yuchen Yang, Sungkyun Chang, Mingshuo Ding, Yizhi Li, Ruibin Yuan, Simon Dixon, Emmanouil Benetos. (2026). "CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction." ICML 2026. https://arxiv.org/abs/2603.00610
CMI-RewardBench evaluates music reward models with compositional multimodal instructions and introduces a new benchmark for generated audio preference learning.
- Introduced a new task for evaluating generated audio from compositional music instructions.
- Created the CMI-Pref dataset using 23 music generation models, with 110k LLM-labeled preferences, 797 hours of generated audio, and a human-labeled subset containing 4027 preference labels.
- Built CMI-RewardBench to evaluate preference learning methods on compositional multimodal music instructions.
- Trained a parameter-efficient reward model that reaches near-SOTA correlation with human labels across multiple evaluation protocols and supports inference-time scaling for music generation models.
