Nonlinear kernel-based statistical pattern analysis

IEEE Trans Neural Netw. 2001;12(1):16-32. doi: 10.1109/72.896793.

Abstract

The eigenstructure of the second-order statistics of a multivariate random population can be inferred from the matrix of pairwise combinations of inner products of the samples. Therefore, it can be also efficiently obtained in the implicit, high-dimensional feature spaces defined by kernel functions. We elaborate on this property to obtain general expressions for immediate derivation of nonlinear counterparts of a number of standard pattern analysis algorithms, including principal component analysis, data compression and denoising, and Fisher's discriminant. The connection between kernel methods and nonparametric density estimation is also illustrated. Using these results we introduce the kernel version of Mahalanobis distance, which originates nonparametric models with unexpected and interesting properties, and also propose a kernel version of the minimum squared error (MSE) linear discriminant function. This learning machine is particularly simple and includes a number of generalized linear models such as the potential functions method or the radial basis function (RBF) network. Our results shed some light on the relative merit of feature spaces and inductive bias in the remarkable generalization properties of the support vector machine (SVM). Although in most situations the SVM obtains the lowest error rates, exhaustive experiments with synthetic and natural data show that simple kernel machines based on pseudoinversion are competitive in problems with appreciable class overlapping.