Komei Fukuda fukuda@ifor.math.ethz.ch Vera Rosta rosta@renyi.hu In this short article, we consider the notion of data depth which generalizes the median to higher dimensions. Our main objective is to present a snapshot of the data depth, several closely related notions, associated optimization problems and algorithms. In particular, we briefly touch on our recent approaches to compute the data depth using linear and integer optimization programming. Although the problem is NP-hard, there are ways to compute nontrivial lower and upper bounds of the depth.
The notion of data depth has been studied independently in statistics, discrete geometry, political science and optimization. The motivation and the necessity in statistics to generalize the median and the rank is very natural, as the mean is not considered to be a robust measure of central location. It is enough to place one outlier to change the mean.
In contrast, the median in one dimension is very robust as half of the observations need to be changed to corrupt the value of the median.
In nonparametric statistics, several data depth measures were introduced as multivariate generalizations of ranks to complement classical multivariate analysis, first by Tukey
(1975), then followed by Oja (1983), Liu (1990), Donoho and Gasko (1992), Singh (1992),
Rousseeuw and Hubert (1999) among others. These measures, though seemingly different, have strong connections. The halfspace depth, also known as location depth or Tukey depth introduced by Tukey in 1974 is perhaps the best known among the data depth measures in nonparametric statistics, and in discrete and computational geometry. It also has a strong connection to the maximum feasible subsystem problem, Max FS, in optimization.
The halfspace depth of a point p relative to a data set S of n points in Euclidean space
Rd , is the smallest number of points of S in any closed halfspace with boundary through
p. A point of
References: [1] D. Bremner, K. Fukuda and V. Rosta, Primal-dual algorithms for data depth, Technical Report, McGill University, Dept. Math/Stats, 2004-11. Submitted. of Symposia in Pure Mathematics 7, AMS (1963), 101–180. Geometry and Applications, 6 (1996), 357–377, and ACM Symp. Computational Geometry (1993), 1–21. [4] D.L. Donoho and M. Gasko, Breakdown properties of location estimates based on halfspace depth and projected outlyingness, The Annals of Statistics 20(4) (1992), Floudas and P.M. Pardalos, editors, Kluwer Academic Publishers (2004), 123–134. [7] D.S. Johnson and F.P. Preparata, The densest hemisphere problem, Theoretical Computer Science 6 (1978), 93–107. [10] P.J. Rousseeuw and M. Hubert, Regression depth, Journal of American Statistical Association 94 (1999), 388–402.