Self-Improving Pretraining: using post-trained models to pretrain better models

The "Self-Improving Pretraining" framework integrates alignment objectives (safety, factuality, quality) directly into LLM pretraining using a powerful post-trained model as a dynamic rewriter and judge. This method leads to significant gains in generation coherence and factuality, improving the reliability and trustworthiness of large language models for real-world use.

Self-Improving Pretraining: using post-trained models to pretrain better models

Abstract

Projects