Reward-free Alignment for Conflicting Objectives

By: Peter Chen, Xiaopeng Li, Xi Chen, Tianyi Lin

Published: 2026-02-03

View on arXiv →
#cs.AI

Abstract

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where naive aggregation of preferences can lead to unstable training and poor trade-offs. We propose a Reward-free Alignment framework for Conflicted Objectives (RACO) that directly leverages pairwise preference data and resolves gradient conflicts via a novel clipped variant of conflict-averse gradient descent. We provide convergence guarantees to Pareto-critical points that respect user-specified objective weights, and further show that clipping can strictly improve convergence rate in the two-objective setting.

FEEDBACK

Projects

No projects yet

Reward-free Alignment for Conflicting Objectives | ArXiv Intelligence