MINDS 2021 Winter Symposium- Raaz Dwivedi
Title– Subgroup discovery in randomized experiments and non-asymptotic results for MCMC sampling
Abstract– A data scientist often needs empirical or theoretical evidence to choose from the numerous methods/algorithms available for a given learning task. In supervised machine learning, accuracy on a hold-out dataset is commonly used to make this choice. This talk presents research that can inform such choice making in two classical contexts where a direct measure for hold-out accuracy is not available: heterogeneity estimation in causal inference and Markov Chain Monte Carlo (MCMC) sampling used commonly in Bayesian inference.
In the first part of the talk, I will introduce a data-driven methodology StaDISC, designed for reliable heterogeneity treatment effect estimation in randomized experiments. StaDISC provides calibration-based predictive checks to select from various conditional average treatment effect (CATE) models and discovers interpretable and stable subgroups with heterogeneous treatment effects. I will illustrate StaDISC in the context of precision medicine with a re-analysis of two randomized controlled trials.
The second part will establish non-asymptotic mixing time guarantees, namely, the number of iterations needed to reach the desired accuracy, for popular MCMC algorithms. These user-friendly guarantees can help tune and select a sampling method given a computational budget. I will provide results for the Langevin algorithm and state-of-the-art Hamiltonian Monte Carlo and illustrate provable finite time advantages of using the Metropolis-Hastings correction and gradient information for sampling.
The talk’s focus is on two research thrusts: StaDISC work in collaboration with Yan Shuo Tan, Briton Park, Mian Wei, Kevin Horgan, David Madigan, and Bin Yu, and MCMC work in collaboration with Yuansi Chen, Martin Wainwright, and Bin Yu.