Developing synthetic microdata through machine learning for firm-level business surveys
By: Jorge Cisneros Paz, Timothy Wojan, Matthew Williams, Jennifer Ozawa, Robert Chew, Kimberly Janda, Timothy Navarro, Michael Floyd, Christine Task, Damon Streat
Published: 2025-12-05
View on arXiv →#cs.LG
Abstract
Public-use microdata samples often risk re-identification, especially for firm-level data where anonymity is difficult. This paper describes a machine learning model to construct synthetic public-use microdata based on business surveys, preserving statistical properties while ensuring privacy and mitigating re-identification risks.