Self-Distillation Enables Continual Learning

This paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling large language models to continually acquire new skills and knowledge from demonstrations without catastrophic forgetting. SDFT leverages in-context learning by using the model itself as a teacher, outperforming traditional fine-tuning and allowing models to accumulate multiple skills over time.

Self-Distillation Enables Continual Learning

Abstract

Projects