Gradient-based optimization of hyperparameters

Y Bengio

doi:10.1162/089976600300015187

Gradient-based optimization of hyperparameters

Neural Comput. 2000 Aug;12(8):1889-900. doi: 10.1162/089976600300015187.

Author

Y Bengio¹

Affiliation

¹ Département d'informatique et recherche opérationnelle, Université de Montréal, Montréal, Québec, Canada, H3C 3J7.

PMID: 10953243
DOI: 10.1162/089976600300015187

Abstract

Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Artificial Intelligence*