Principal Components Analysis
Introduction
Principal Components Analysis (PCA) attempts to analyse the structure in a data set in order to define uncorrelated components that capture the variation in the data. The identification of components is often desirable as it is usually easier to consider a relatively small number of unrelated components which have been derived from the data than a larger group of related variables. PCA is particularly useful in management research, as it is often used as a first step in assigning meanings to the structure in the data (by attaching descriptions to the components) through the technique of factor analysis. PCA can also help in alleviating some of the problems with variable selection in regression models that are associated with multicollinearity, which is caused by correlations between the explanatory variables.
Key Features
• • • PCA attempts to represents a data frame containing correlated variables in terms of uncorrelated components. The principal components identified account for successively smaller amounts of the variability in the data frame. By selecting those components that account for relatively large amounts of variability, PCA can be used to reduce a large number of correlated variables to a smaller number of uncorrelated components. PCA can help to identify the under-lying structure in the data and provide clues about causal connections.
•
Put very simply, principal component analysis converts correlated variables into uncorrelated components. It accomplishes this by identifying directions in the data (called components) where the variation is at a maximum and uses linear combinations of the observed variables to describe the component. Below is the general form for the formula to compute scores on the first component extracted in a principal component analysis: Principal Component 1 = β11