Jean-Philippe Vert, “Deep learning for biological sequences”
Jean–Philippe Vert, PhD
Research Scientist
Google Brain
Paris
Title: “Deep learning for biological sequences”
Abstract: In recent years, deep learning has revolutionized natural language processing (NLP), and is increasingly used to analyze biological sequences including DNA, RNA and proteins. While many deep learning architectures and techniques successful in NLP can be directly applied to biological sequences, there are also specificities in biological sequences that should be taken into account to adapt NLP techniques to that context. In this talk I will discuss several such specificities, including the fact that 1) biological sequences have no natural separation as a sequence of words, 2) a double-stranded DNA sequence can be represented by two reverse-complement sequences, and 3) a natural way to compare homologous biological sequences is to align them. In each case, I will show how the biological constraints can lead to specific models, and illustrate empirically the benefits of incorporating such prior knowledge on several tasks such as metagenomics read binning, protein-DNA binding prediction, or protein annotation.
Biography: Jean-Philippe Vert is a research scientist at Google Brain in Paris and adjunct research professor at PSL Mines ParisTech’s Centre for Computational Biology. Prior to joining Google in 2018, he worked as a postdoc in computational biology at Kyoto University (2001-2002), research professor and founding director of the Centre for Computational Biology at Mines ParisTech (2003-2018), team leader at the Curie Institute in Paris on computational biology of cancer (2008-2018), Miller visiting professor at UC Berkeley (2015-2016), and research professor at the department of mathematics of Ecole normale superieure in Paris (2016-2018). He graduated from Ecole Polytechnique (1995), Corps des Mines (1998), and holds a PhD in mathematics from Paris 6 University (2001). His research interest concerns the development of statistical and machine learning methods, particularly to model complex, high-dimensional and structured data, with an application focus on computational biology, genomics and precision medicine. His recent contributions include new methods to embed structured data such as strings, graphs or permutations to vector spaces, regularization techniques to learn from limited amounts of data, and computationally efficient techniques for pattern detection and feature selection. He is also working on several medical applications in cancer research, including quantifying and modeling cancer heterogeneity, predicting response to therapy, and modeling the genome and epigenome of cancer cells at the single-cell level.
Join Zoom Meeting:
https://wse.zoom.us/j/99567504456?pwd=WkI2UlpGT3p6MldLS05VNkdmcGxiZz09
Meeting ID: 995 6750 4456
Passcode: Clark