Classical data analytic algorithms focus on the setting where the algorithm has access to a fixed dataset obtained prior to any analysis. However, in most applications, we have control over the data collection process such as which image labels to obtain, which drug-gene interactions to record, which network routes to probe, which movies to rate, etc. Furthermore, despite the availability of big data, the ever-increasing complexity of data analytic problems implies that most applications face a limited data budget. Thus, there is an opportunity and need to develop intelligent algorithms that can interact with the data generating mechanism to guide what data to collect.
In this talk, we ask the question – what does the freedom to interactively collect data in a feedback-driven manner buy us? I will present a sampling of work by my group on principled interactive methods for several learning problems such as regression, classification, matrix and tensor completion/approximation, column subset selection, learning structure of graphical models, reconstructing graph-structured signals, and clustering, as time permits. I will quantify the precise improvement in the amount of data needed to achieve a desired statistical error, as well as demonstrate that interactive algorithms often also enables us to handle a larger class of data models than passive (non-feedback driven) methods. Finally, I will conclude with open directions and challenges that face interactive data analytics.
Aarti Singh, Carnegie Mellon University