SymphonyGen

3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Xuzheng He1, Nan Nan2,†, Zhilin Wang3, Ziyue Kang2, Zhuoru Mo4, Ao Li2, Yu Pan1, Xiaobing Li1, Feng Yu1, Xiaohong Guan1,2,†

1Central Conservatory of Music | 2Xi'an Jiaotong University | 3University of Science and Technology of China | 4Shenzhen University

Corresponding authors

Abstract

Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations. Additionally, we implement a dissonance-averse sampling algorithm to suppress unintended tonal clashes during inference. Objective evaluations show that both reinforcement learning and dissonance-averse sampling effectively enhance harmonic cleanliness while maintaining melodic expression. Subjective evaluations demonstrate that SymphonyGen outperforms baselines in musicality and preference for orchestral music generation.

Best Examples

📢 Disclaimer: You are listening to cherry-picked examples of our model.


Average Examples

Orchestral Composition Task

SymphonyGen first generates a harmony skeleton and then produces the full orchestration based on that skeleton. Reinforced with CLaMP 3 score only

Starting with a Major Chord


Starting with a Minor Chord

Orchestral Arrangement Task

SymphonyGen uses harmony skeletons analyzed from excerpts in the SymphonyNet Dataset (validation split), and re-orchestrates the skeleton. Reinforced with CLaMP 3 score and track density