Developing synthetic microdata through machine learning for firm-level business surveys

By: Jorge Cisneros Paz, Timothy Wojan, Matthew Williams, Jennifer Ozawa, Robert Chew, Kimberly Janda, Timothy Navarro, Michael Floyd, Christine Task, Damon Streat

Published: 2025-12-05

View on arXiv →

#cs.LG

Abstract

Public-use microdata samples often risk re-identification, especially for firm-level data where anonymity is difficult. This paper describes a machine learning model to construct synthetic public-use microdata based on business surveys, preserving statistical properties while ensuring privacy and mitigating re-identification risks.

FEEDBACK

Projects

No projects yet