Prompt-guided Zero-shot Image Segmentation

Zero-shot image segmentation, the task of segmenting unseen object categories without requiring any labeled examples, is a challenging but highly desirable capability for many real-world computer vision applications. Recent advancements in large-scale vision-language models have opened new avenues for tackling this problem. This paper proposes a novel framework for prompt-guided zero-shot image segmentation. Our approach leverages the rich semantic knowledge embedded in pre-trained vision-language models by conditioning the segmentation process on textual prompts. We explore various strategies for generating effective prompts, including descriptive natural language phrases and task-specific keywords, to guide the model towards segmenting target objects. Through extensive experiments on diverse datasets, we demonstrate that our prompt-guided approach significantly outperforms existing zero-shot segmentation methods, achieving state-of-the-art performance across multiple benchmarks. Furthermore, we provide insights into the role of prompt design and the capabilities of large vision-language models in enabling robust and flexible zero-shot segmentation.

Prompt-guided Zero-shot Image Segmentation

Abstract

Projects