PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs

By: Artem Dementyev, Wazeer Zulfikar, Sinan Hersek, Pascal Getreuer, Anurag Kumar, Vivek Kumar

Published: 2026-01-26

View on arXiv →
#cs.AI

Abstract

This paper presents PhaseCoder, a transformer-only spatial audio encoder that operates independently of microphone geometry. It processes raw multichannel audio and microphone coordinates to perform localization and generate robust spatial embeddings. This enables multimodal Large Language Models (LLMs) to perform complex spatial reasoning and targeted transcription from various microphone arrays.

FEEDBACK

Projects

No projects yet

PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs | ArXiv Intelligence