Can swapping be differentially private? A refreshment stirred not shaken

Date: October 31, 2023

Statistics Canada Methodology Seminar, Virtual

Abstract

To directly address the title’s query, an answer must necessarily presuppose a precise specification of differential privacy (DP). Indeed, as there are many different formulations of DP – which range, both qualitatively and quantitatively, from being practically and theoretically vacuous to providing gold-standard privacy protection – a straight answer to the question “is X differentially private?” is not particularly informative and is likely to lead to confusion or even dispute, especially when the presupposed DP specification is not clearly spelt out and fully comprehended.

A true answer to the title’s query must therefore be predicated upon an understanding of how formulations of DP differ, which is best explored, as is often the case, by starting with their unifying commonality. DP specifications are, in essence, Lipschitz conditions on the data-release mechanism. The core philosophy of DP is thus to manage relative privacy loss by limiting the rate of change of the variations in the noise-injected output statistics when the confidential input data are (counterfactually) suitably perturbed. Hence, DP conceives of privacy protection specifically as control over the Lipschitz constant – i.e. over this rate of change; and different DP specifications correspond to different choices of how to measure input perturbation and output variation, in addition to the choice of how much to control this rate of variations-to-perturbation. Following this line of thinking through existing DP literature leads to five necessary building blocks for a DP specification. They are, in order of mathematical prerequisite, the protection domain (data space), the scope of protection (data multiverse), the protection units (unit for data perturbation), the standard of protection (measure for output variations), and the intensity of protection (privacy loss budget). In simple terms, these are the “what”, “where”, “who”, “how”, and “how much” questions of DP.

Under this framework, we consider DP’s applicability in scenarios like the US Census, where the disclosure of certain aggregates is mandated by the US Constitution. We design and analyze a data swapping method, called the Permutation Swapping Algorithm (PSA), which is reminiscent of the statistical disclosure control (SDC) procedures employed in several US Decennial Censuses before 2020. For comparative purposes, we are also interested in the principal SDC method of the 2020 Census, the TopDown algorithm (TDA), which melds the DP specification of Bun and Steinke [2016a] (B&S) with Census policy and constitutional mandates.

We analyze the DP properties of both data swapping and TDA. Both B&S’s specification and the original ε-DP specification of Dwork et al. [2006b] demand that no data summary is disclosed without noise – which is impossible for swapping methods as they inherently preserve, and hence disclose, some margins; and is also impossible for TDA since it too keeps some counts invariant. Therefore, for the same reasons that TDA cannot satisfy the B&S specification, data swapping cannot satisfy the original ε-DP specification. On the other hand, we establish that PSA is ε-DP, subject to the invariants it necessarily induces and we show how the privacy-loss budget ε is determined by the swapping rate and the maximal size of the swapping classes. We also prove a DP specification for TDA, by subjecting B&S’s specification to TDA’s invariants. Drawing a parallel, we assess the privacy budget for the PSA in the hypothetical situation where it was adopted for the 2020 Census. Our overarching ambition is two-fold: firstly, to leverage the merits of DP, including its mathematical assurances and algorithmic transparency, without sidelining the advantages of classical SDC; and secondly, to unveil the nuances and potential pitfalls in employing DP as a theoretical yardstick for privacy methodologies. By spotlighting data swapping, we aspire to stimulate rigorous evaluations of other SDC techniques, emphasizing that the privacy-loss budget ε is merely one of five building blocks for the mathematical foundations of DP.

This talk is based on joint work with Ruobin Gong and Xiao-Li Meng:

Slides

James Bailie