Daniela Witten
“Selective inference for trees”
Abstract: As datasets grow in size, the focus of data collection has increasingly shifted away from testing pre-specified hypotheses, and towards hypothesis generation. Researchers are often interested in performing an exploratory data analysis to generate hypotheses, and then testing those hypotheses on the same data. Unfortunately, this type of ‘double dipping’ can lead to highly inflated Type 1 errors. In this talk, I will consider double-dipping on trees.
First, I will focus on trees generated via hierarchical clustering, and will consider testing the null hypothesis of equality of cluster means. I will propose a test for a difference in means between estimated clusters that accounts for the cluster estimation process, using a selective inference framework. Second, I’ll consider trees generated using the CART procedure, and will again use selective inference to conduct inference on the means of the terminal nodes. Applications include single-cell RNA-sequencing data and the Box Lunch Study.
This work is the result of collaborations with Lucy Gao, Anna Neufeld, and Jacob Bien.
Bio: Daniela Witten is a professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data, with a focus on unsupervised learning. Daniela is the recipient of an NIH Director’s Early Independence Award, a Sloan Research Fellowship, an NSF CAREER Award, a Simons Investigator Award in Mathematical Modeling of Living Systems, a David Byar Award, a Gertrude Cox Scholarship, and an NDSEG Research Fellowship. She is also the recipient of the Spiegelman Award from the American Public Health Association for a statistician under age 40 who has made outstanding contributions to statistics for public health, as well as, the Leo Breiman Award for contributions to the field of statistical machine learning. She is a Fellow of the American Statistical Association, and an Elected Member of the International Statistical Institute. Daniela is a co-author (with Gareth James, Trevor Hastie, and Rob Tibshirani) of the very popular textbook “Introduction to Statistical Learning”. She was a member of the National Academy of Medicine (formerly the Institute of Medicine) committee that released the report “Evolution of Translational Omics”.
Daniela completed a BS in Math and Biology with Honors and Distinction at Stanford University in 2005, and a PhD in Statistics at Stanford University in 2010.