MINDS Symposium on the Foundations of Data Science
Matteo Sesia
Title: Gene hunting with knockoffs
Abstract: Data science problems often require the idetification of a subset of relevant explanatory variables from a large number of possible candidates, in the attempt to understand an interesting phenomenon. A flexible statistical framework for this task is offered by knockoffs, which allow one to rigorously test the conditional importance of each predictor while controlling the false discovery rate, without relying on strong assumptions.
This talk presents some methodological developments that enable practical applications, focusing on situations where the distribution of the explanatory variables can be well approximated by a hidden Markov model, such as in genome-wide association studies. Then it describes how to leverage these results to obtain a practical tool for the genetic mapping of complex phenotypes that can detect more numerous and precise discoveries than state-of-the-art alternatives, at comparable computational cost. Finally, it discusses an application to the UK Biobank data that has lead to many new findings.
Bio: Matteo is a fifth-year Ph.D. candidate in Statistics at Stanford University, advised by Emmanuel Candès. Prior to joining Stanford, he studied Physics at Politecnico di Torino and Université Paris-Sud, as well as Statistics and Applied Mathematics at Collegio Carlo Alberto. His research is interested in developing statistically principled and computationally efficient methodology for complex data science problems. During his thesis, he focused on testing variable importance in high-dimensions.