Self-Distillation Enables Continual Learning

By: Idan Shenfeld, Tianxiao Shen, Jonathan Gordon

Published: 2026-01-27

View on arXiv →
#cs.AI

Abstract

This paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling large language models to continually acquire new skills and knowledge from demonstrations without catastrophic forgetting. SDFT leverages in-context learning by using the model itself as a teacher, outperforming traditional fine-tuning and allowing models to accumulate multiple skills over time.

FEEDBACK

Projects

No projects yet

Self-Distillation Enables Continual Learning | ArXiv Intelligence