Topics in privacy, data privacy and differential privacy
James Bailie.
PhD Thesis, Harvard University, 2025.
Abstract
In an era of unprecedented data availability and analytic capacity, the protection of individuals’ privacy in statistical data releases is becoming an increasingly difficult problem. This dissertation contributes to the theoretical and methodological foundations of statistical data privacy, largely focusing on differential privacy (DP). We begin with a multifaceted investigation into privacy from legal, economic, social, and philosophical standpoints, before turning to a formal system of DP specifications built around five core building blocks found throughout the literature: the domain, multiverse, input premetric, output premetric, and protection loss budget. This system is applied to statistical disclosure control (SDC) mechanisms used in the US Decennial Census, analyzing both the traditional method of data swapping and the contemporary TopDown Algorithm. Beyond these case studies, this dissertation explores the inferential limitations posed by DP and Pufferfish privacy in both frequentist and Bayesian settings, establishing general bounds under mild assumptions. It further addresses the challenges of applying DP to complex survey pipelines, incorporating issues such as sampling, weighting, and imputation. Finally, it contextualizes DP within broader frameworks of data privacy, namely the Five Safes and contextual integrity, advocating for a more integrated approach to privacy that respects statistical utility, transparency, and societal norms.
Suggested Citation
James Bailie (2025). “Topics in Privacy, Data Privacy and Differential Privacy”. PhD thesis. Harvard University, Cambridge, MA, p. 465. url: proquest.com/docview/3217403500/abstract/5B30DA0A3D85414APQ/1
BibLaTeX
Loading...
