Recipient of an Emerging Talent Award and an Artificial Intelligence Journal Fellowship.
3rd prize in the 2020 International Association for Official Statistics (IAOS) Young Statisticians Competition.
Abstract: The Australian Bureau of Statistics (ABS), like other national statistical offices, is considering the opportunities of differential privacy (DP). This research considers the Australian Bureau of Statistics (ABS) TableBuilder perturbation methodology in a DP framework. DP and the ABS perturbation methodology are applying the same idea – infusing noise to the underlying microdata – to protect aggregate statistical outputs. This research describes some differences between these approaches. Our findings show that noise infusion protects against disclosure risks in the aggregate Census Tables. We highlight areas of future ABS research on this topic.
Honours thesis supervised by Dr. Vigleik Angeltveit at the Australian National University.
Vacation Research Scholarship Report.
An expository report on using generalised additive models for spatio-temporal data.
Presented to the ABS Methodological Advisory Committee in March 2019.
Abstract: The proofs of SAI in the literature are either flawed, too condensed or do not give the necessary background. The motivation for this report is thus to provide a concise, self-contained proof of SAI. The report also includes all the necessary background to understand SAI and its proof.
Abstract: The p% rule classifies an aggregate statistic as a disclosure risk if one contributor can use the statistic to determine another contributor's value to within p%. This is often possible in economic data when there is a monopoly or a duopoly. Therefore, the p% rule is an important statistical disclosure control and is frequently used in national statistical organisations. However, the p% rule is only a method for assessing disclosure risk: While it can say whether a statistic is risky or not, it does not provide a mechanism to decrease that risk. To address this limitation, we encode the p% rule into a formal privacy definition using the Pufferfish framework and we develop a perturbation mechanism which is provably private under this framework. This mechanism provides official statisticians with a method for perturbing data which guarantees a Bayesian formulation of the p% rule is satisfied. We motivate this work with an example application to the Australian Bureau of Statistics (ABS).
Abstract: Differential privacy (DP) has emerged in the last decade from the computer science literature as a way of measuring the privacy risk associated with statistical outputs. In this talk, we will outline some of the promises of DP and motivate why MD is investigating this area.
Abstract: The Australian Census of Population and Housing (the Census) aims to count every person in Australia on a particular night – called the Census night. Houses which do not complete a Census form and do not respond to the Australian Bureau of Statistics’ (ABS) follow-up campaign, pose a complication to achieving this aim: Are these dwellings unoccupied, or are they occupied and the residents unresponsive? To achieve its aim, the Census should count these unresponsive residents, but how can the ABS accurately do this? To answer these questions, the ABS has developed a model which uses administrative data – collected by various government and non-government organisations – to predict the occupancy status of a dwelling. There are various challenges surrounding this new method, including the lack of ground truth, and the presence of strongly unbalanced classes . However, the method will improve the accuracy of ABS Census population counts and has been adopted as part of the 2021 Australian Census imputation process.
Abstract: In recent years, three new statistical attacks affecting population count tables have caught our attention: 1) the US Census Bureau reconstruction attack on their 2010 census data; 2) a differencing attack exploiting non-additive perturbation mechanisms; and 3) a split averaging attack. A key feature uniting all of these attacks is the requirement of significant computing resources, in addition to the underlying mathematical theory. This talk will describe these attack methods and outline some of the work we and other NSOs have done arising from these attacks. I will also quickly run through the perturbation mechanism used in TableBuilder, as a pre-requisite for understanding whether we are vulnerable to these attacks.
Back to James Bailie's homepage.