Kate Saenko – Learning from Small and Biased Datasets
Abstract: Deep Learning has made exciting progress on many computer vision problems such as object recognition in images and video. However, it has relied on large datasets that can be expensive and time-consuming to collect and label. Datasets can also suffer from “dataset bias,” which happens when the training data is not representative of the future deployment domain. Dataset bias is a major problem in computer vision — even the most powerful deep neural networks fail to generalize to out-of-sample data. A classic example of this is when a network trained to classify handwritten digits fails to recognize typed digits, but this problem happens in many situations, such as new geographic locations, changing demographics, and simulation-to-real learning. Can we solve dataset bias and learn with only a limited amount of supervision? Indeed, we can, under certain assumptions. I will describe some recent work based on domain adaptation of deep learning models and point out several assumptions they make and situations they fail to handle. I will also describe recent efforts to improve adaptation by using unlabeled data to learn better features, with ideas from self-supervised learning.
Bio: Kate is an Associate Professor of Computer Science at Boston University and a consulting professor for the MIT-IBM Watson AI Lab. She leads the Computer Vision and Learning Group at BU, is the founder and co-director of the Artificial Intelligence Research (AIR) initiative, and member of the Image and Video Computing research group. Kate received a PhD from MIT and did her postdoctoral training at UC Berkeley and Harvard. Her research interests are in the broad area of Artificial Intelligence with a focus on dataset bias, adaptive machine learning, learning for image and language understanding, and deep learning.