A Rate-Distortion Framework For Supervised Learning
Matthew Nokleby, Ahmad Beirami, Robert Calderbank

An information-theoretic framework is presented for bound- ing the number of samples needed for supervised learning in a parametric Bayesian setting. This framework is inspired by an analogy with rate-distortion theory, which characterizes tradeoffs in the lossy compression of random sources. In a parametric Bayesian environment, the maximum a posteriori classifier can be viewed as a random function of the model parameters. Labeled training data can be viewed as a finite-rate encoding of that source, and the excess loss due to using the learned classifier instead of the MAP classifier can be viewed as distortion. A strict bound on the loss—measured in terms of the expected total variation—is derived, providing a mini- mum number of training samples needed to drive the expected total variation to within a specified tolerance. The tightness of this bound is demonstrated on the classification of Gaussians, for which one can derive closed-form expressions for the bound.