Self-Distillation Enables Continual Learning
By: Idan Shenfeld, Tianxiao Shen, Jonathan Gordon
Published: 2026-01-27
View on arXiv →#cs.AI
Abstract
This paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling large language models to continually acquire new skills and knowledge from demonstrations without catastrophic forgetting. SDFT leverages in-context learning by using the model itself as a teacher, outperforming traditional fine-tuning and allowing models to accumulate multiple skills over time.