LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech

By: Bingshen Mu, Xian Shi, Xiong Wang, Hexin Liu, Jin Xu, Lei Xie

Published: 2026-01-26

View on arXiv →
#cs.AI

Abstract

Traditional forced alignment (FA) methods often suffer from language-specificity and cumulative temporal shifts. This paper introduces LLM-ForcedAligner, a novel approach that reformulates FA as a slot-filling paradigm using large language models (LLMs) for multilingual, crosslingual, and long-form speech. By treating timestamps as discrete indices and inserting special timestamp tokens as slots, the model directly predicts time indices at these slots. This design supports non-autoregressive inference, effectively avoiding hallucinations and significantly improving speed, achieving a substantial reduction in accumulated averaging shift compared to previous FA methods.

FEEDBACK

Projects

No projects yet

LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech | ArXiv Intelligence