Kolmogorov-Smirnov Goodness-of-Fit Test
Purpose:
Test for Distributional AdequacyThe Kolmogorov-Smirnov test (Chakravart, Laha, and Roy, 1967) is used to decide if a sample comes from a population with a specific distribution.
The Kolmogorov-Smirnov (K-S) test is based on the empirical distribution function (ECDF). Given N ordereddata points Y1, Y2, ..., YN, the ECDF is defined as
\[ E_{N} = n(i)/N \] where n(i) is the number of points less than Yi and the Yiare ordered from smallest to largest value. This is a step function that increases by 1/N at the value of each ordered data point.
The graph below is a plot of the empirical distribution function with a normal cumulative distribution function for 100 normal random numbers. The K-S test is based on the maximum distance between these two curves.
Characteristics and Limitations of the K-S TestAn attractive feature of this test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size for the approximations to be valid). Despite these advantages, the K-S test has several important limitations:
1. It only applies to continuous distributions.
2. It tends to be more sensitive near the center of the distribution than at the tails.
3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation.
Several goodness-of-fit tests, such as the Anderson-Darlingtest and the Cramer Von-Mises test, are refinements of the K-S test. As these refined tests are generally considered to be more powerful than the original K-S test, many analysts prefer them. Also, the advantage for the K-S test of having the critical values be