Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training

By: Haoran Wang, Xuanyi Zhang, Shuangsang Fang, Longke Ran, Ziqing Deng, Yong Zhang, Yuxiang Li, Shaoshuai Li

Published: 2026-01-09

View on arXiv →
#cs.AI

Abstract

Recent advancements in single-cell multi-omics provide profound insights into cellular heterogeneity. This paper proposes OKR-CELL, an Open-world Language Knowledge-Aided Robust Single-Cell Foundation Model. It leverages Large Language Models (LLMs) with retrieval-augmented generation to enrich cell textual descriptions using open-world knowledge and devises a Cross-modal Robust Alignment objective to strengthen the model's resistance to noisy data. After pretraining on 32M cell-text pairs, OKR-CELL achieves cutting-edge results across 6 evaluation tasks, including cell clustering, cell-type annotation, batch-effect correction, few-shot annotation, zero-shot cell-type annotation, and bidirectional cell-text retrieval.

FEEDBACK

Projects

No projects yet