Deep statistics for more rigorous and efficient data science

Nano 288
Teaching fellow, 2023 Australian Statistical Conference workshop
Taught: 2023 December
Description

This workshop presents, with the help of the original teaching team, a nano version of an advanced Ph.D. course at Harvard, “Deep Statistics: AI and Earth Observations for Sustainable Development”, which has the following course description: “With the aim to enhance concomitantly the rigor and efficiency of data science for scientific inquires, deep statistics emphasizes principled systems thinking throughout the entire data science ecosystem, from data conception to their post mortem examination for scientific reproducibility and replicability. This course introduces a trinity of deep statistics of, for and by multi-source, multi-phase, and multi-resolution statistical learning, and invites research participations on their implications and implementations in the context of AI and Earth Observations (EO) for sustainable development (e.g., global poverty and health). Theoretically, the course contemplates many trade-offs for ‘data science for science’: data quality vs. quantity, data privacy vs. utility, statistical vs. computational efficiencies, inferential robustness vs. relevance, etc. Practically, it scrutinizes issues such as conceptualizing and collecting complex socioeconomic data, handling messy survey and satellite data, assessing uncertainties with black-box learning, and contemplating causal implications from AI-EO data. High-level methodological overviews of topics such as survey design, differential privacy, multiple imputation, bootstraps, and deep learning, will be provided on an as-needed basis.”

Instructor of record: Prof. Xiao-Li Meng