Navigating privacy and utility with multiple imputation, satellite imaging and deep learning

Date:

Abstract

Data science for complex societal problems, such as combating poverty on a global scale, typically involve understanding and addressing multiple challenges. Some examples are (1) integrating data of different types and quality; (2) reducing bias due to data defects; (3) trading data privacy for utility; (4) assessing uncertainties in black-box algorithms. This article documents how we use the framework of multiple imputation to investigate and navigate such challenges in the context of studying poverty in Africa, where we integrate anonymized ground-level surveys with satellite images via deep learning. Advantages of the multiple imputation approach include its (a) statistical efficiency by following a Bayesian approach to incorporate prior and auxiliary information; (b) implementation readiness with black-box methods directly executed on the imputation replications; (c) conceptual simplicity, as a form of data augmentation; and (d) ability to assess uncertainty, via the joined replication of the imputation and the model fitting. However, it is computationally demanding because it requires repeated training over imputation replications, and it is sensitive to the imputation model.