Thomas Schreiber Department of Theoretical Physics, University of Wuppertal, D{42097 Wuppertal July 18, 1996
We want to encourage the use of fast algorithms to nd nearest neighbors in k{dimensional space. We review methods which are particularly useful for the study of time series data from chaotic systems. As an example, a simple box{assisted method and possible re nements are described in some detail. The e ciency of the method is compared to the naive approach and to a multidimensional tree for some exemplary data sets.
1 Introduction
Finding nearest neighbors in k{dimensional space is a task encountered in many data processing problems. In the context of time{series analysis e.g. it occurs if one is interested in local properties in a reconstructed phase space. Examples are predictions, noise reduction or Lyapunov exponent estimates based on local ts to the dynamics, or the calculation of dimension estimates. Other applications in physics include simulations of molecular dynamics with nite range interactions, where a box{oriented approach is used called \link{cell algorithm." Fincham & Heyes, 1985, Form et al., 1992] As long as only small sets (say n < 1000 points) are evaluated, neighbors can be found in a straightforward way by computing the n2 =2 distances between all pairs of points. However, numerical simulations and to an increasing degree experiments are able to provide much larger amounts of data. With increasing data sets e cient handling becomes more important. Neighbor searching and related problems of computational geometry have been extensively studied in computing science, with a rich literature covering both theoretical and practical issues. General references include Sedgewick, 1990, Preparata & Shamos, 1985, Gonnet & Baeza{Yates, 1991, Mehlhorn, 1984]. In particular, the tree{like data structures are studied in Omohumdro, 1
1987, Bentley, 1980, Bentley, 1990], and the bucket