Daniel Hsu – Contrastive learning, multi-view redundancy, and linear models
Abstract: Contrastive learning is a “self-supervised” approach to representation learning that uses naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data. We study contrastive learning in the context of multi-view statistical models. First, we show that whenever the views of the data are approximately redundant in their ability predict a target function, a low-dimensional embedding obtained via contrastive learning affords a linear predictor with near-optimal predictive accuracy. Second, we show that in the context of topic models, the embedding can be interpreted as a linear transformation of the posterior moments of the hidden topic distribution given the observed words. We also empirically demonstrate that linear classifiers with these representations perform well in document classification tasks with very few labeled examples in a semi-supervised setting.
This is joint work with Akshay Krishnamurthy (MSR) and Christopher Tosh (Columbia).
Bio: Daniel Hsu is an associate professor in the Department of Computer Science and a member of the Data Science Institute, both at Columbia University. Previously, he was a postdoc at Microsoft Research New England, and the Departments of Statistics at Rutgers University and the University of Pennsylvania. He holds a Ph.D. in Computer Science from UC San Diego, and a B.S. in Computer Science and Engineering from UC Berkeley. He was selected by IEEE Intelligent Systems as one of “AI’s 10 to Watch” in 2015 and received a 2016 Sloan Research Fellowship.
Daniel’s research interests are in algorithmic statistics and machine learning. His work has produced the first computationally efficient algorithms for several statistical estimation tasks (including many involving latent variable models such as mixture models, hidden Markov models, and topic models), provided new algorithmic frameworks for solving interactive machine learning problems, and led to the creation of scalable tools for machine learning applications.