PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs
By: Artem Dementyev, Wazeer Zulfikar, Sinan Hersek, Pascal Getreuer, Anurag Kumar, Vivek Kumar
Published: 2026-01-26
View on arXiv →#cs.AI
Abstract
This paper presents PhaseCoder, a transformer-only spatial audio encoder that operates independently of microphone geometry. It processes raw multichannel audio and microphone coordinates to perform localization and generate robust spatial embeddings. This enables multimodal Large Language Models (LLMs) to perform complex spatial reasoning and targeted transcription from various microphone arrays.