TRIPODS Winter School & Workshop- Matus Telgarsky
Title –Improved Analyses and Rates of Gradient Descent’s Implicit Bias
Abstract – The implicit bias of gradient descent has arisen as a promising explanation for the good generalization properties of deep networks (Soudry-Hoffer-Nacson-Gunasekar-Srebro, 2018). The purpose of this talk is to demonstrate the effectiveness of a certain dual problem in the analysis of this implicit bias. Concretely, this talk will develop this dual, as well as a variety of consequences in linear and nonlinear settings. In the linear case, this dual perspective firstly will allow a characterization of the implicit bias, even outside the standard setting of exponentially-tailed losses; in this sense, it is gradient descent, and not a particular loss structure which leads to implicit bias. Secondly, invoking duality in the margin convergence analysis will yield a fast 1/t rate; by contrast, all prior analyses never surpassed 1/sqrt{t}, even in the well-studied boosting setting. In the nonlinear case, duality will enable the proof of a gradient alignment property: asymptotically, the parameters and their gradients become colinear. Although abstract, this property in turn implies various existing and new margin maximization results. (Joint work with Ziwei Ji.)