How can we efficiently estimate the support of a high-dimensional probability distribution? This research introduces an algorithm for estimating a subset *S* of the input space that captures most of the probability mass of an underlying distribution *P*. The method estimates a function *f* that is positive on *S* and negative on the complement, using a kernel expansion based on a small subset of training data. A regularization technique controls the length of the weight vector in an associated feature space. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data. The expansion coefficients are determined by solving a quadratic programming problem, implemented via sequential optimization over pairs of input patterns. The authors provide a theoretical analysis of the algorithm’s statistical performance. Described as a natural extension of the support vector algorithm to unlabeled data, this approach addresses the challenge of dimensionality reduction and density estimation. This study provides a valuable tool for machine learning, pattern recognition, and data analysis, where understanding data support is crucial.
Being published in Neural Computation, this paper fits into the journal's focus on machine learning algorithms and neural network models. The journal explores methods of knowledge. This article directly addresses a problem related to machine learning and statistical performance. The references and citations likely connect it to other works in support vector machines and density estimation published in related journals.