Title: Deep (Convolution) Networks from First Principles
Abstract: In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the
perspective of data compression (and group invariance). In particular, we show how modern deep layered
architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived
from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters
of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All
components of so-obtained network, called ReduNet, have precise optimization, geometric, and statistical
interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental t
radeoff between invariance and sparsity for class separability; it reveals a fundamental connection between
deep networks and Fourier transform for group invariance – the computational advantage in the spectral
domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation
(optimization) and backward propagation (variation). In particular, the so-obtained ReduNet is amenable to
fine-tuning via both forward and backward (stochastic) propagation, both for optimizing the same objective.
This is joint work with students Yaodong Yu, Ryan Chan, Haozhi Qi of Berkeley, Dr. Chong You now at Google
Research, and Professor John Wright of Columbia University.