The ANACOR algorithm consists of three major parts: 1. 2. 3. A singular value decomposition (SVD) Centering and rescaling of the data and various rescalings of the results Variance estimation by the delta method.
Other names for SVD are “Eckart-Young decomposition” after Eckart and Young (1936), who introduced the technique in psychometrics, and “basic structure” (Horst, 1963). The rescalings and centering, including their rationale, are well explained in Benzécri (1969), Nishisato (1980), Gifi (1981), and Greenacre (1984). Those who are interested in the general framework of matrix approximation and reduction of dimensionality with positive definite row and column metrics are referred to Rao (1980). The delta method is a method that can be used for the derivation of asymptotic distributions and is particularly useful for the approximation of the variance of complex statistics. There are many versions of the delta method, differing in the assumptions made and in the strength of the approximation (Rao, 1973, ch. 6; Bishop et al., 1975, ch. 14; Wolter, 1985, ch. 6).
Notation
The following notation is used throughout this chapter unless otherwise stated:
k1 k2 p Number of rows (row objects) Number of columns (column objects) Number of dimensions
Data-Related Quantities f ij f i+
Nonnegative data value for row i and column j: collected in table F Marginal total of row i, i = 1,K , k1 Marginal total of column j, j = 1,K , k 2 Grand total of F
f+ j
N
1
2 ANACOR
Scores and Statistics ris Score of row object i on dimension s Score of column object j on dimension s Total inertia
c js
I
Basic Calculations
One way to phrase the ANACOR objective (cf. Heiser, 1981) is to say that we wish to find row scores {ris } and column scores {c js } so that the function
σ {ris };{c js } =
3
8 ∑ ∑ f ∑ 3r ij i j s
is
− c js
8
2
is minimal, under the standardization restriction either that
∑f i i + ris rit
References: Benzécri, J. P. 1969. Statistical analysis as a tool to make patterns emerge from data. In: Methodologies of Pattern Recognition, S. Watanabe, ed. New York: Academic Press. Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate analysis: Theory and practice. Cambridge, Mass.: MIT Press. Eckart, C., and Young, G. 1936. The approximation of one matrix by another one of lower rank. Psychometrika, 1: 211–218. Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory. Golub, G. H., and Reinsch, C. 1971. Linear algebra, Chapter I.10. In: Handbook for Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New York: Springer-Verlag. Greenacre, M. J. 1984. Theory and applications of correspondence analysis. London: Academic Press. Heiser, W. J. 1981. Unfolding analysis of proximal data. Doctoral dissertation. Department of Data Theory, University of Leiden. Horst, P. 1963. Matrix algebra for social scientists. New York: Holt, Rinehart, and Winston. Israëls, A. 1987. Eigenvalue techniques for qualitative data. Leiden: DSWO Press. Nishisato, S. 1980. Analysis of categorical data: dual scaling and its applications. Toronto: University of Toronto Press. Rao, C. R. 1973. Linear statistical inference and its applications, 2nd ed. New York: John Wiley & Sons, Inc. Rao, C. R. 1980. Matrix approximations and reduction of dimensionality in multivariate statistical analysis. In: Multivariate Analysis, Vol. 5, P. R. Krishnaiah, ed. Amsterdam: North-Holland. Wolter, K. M. 1985. Introduction to variance estimation. Berlin: Springer-Verlag.