LECTURE NOTES - CHAPTER 1: SIMPLE LINEAR REGRESSION
I. Introduction
The basic aims of this chapter are:
• Review of the simple linear regression material covered in Statistical Techniques II;
• An introduction to some new notation, including matrices;
• A more detailed study of the properties of the regression estimates; and,
• An investigation of diagnostic procedures to check the credibility of the underlying assumptions of our regression model.
We will, as much as possible, demonstrate concepts through the use of example data. This will also give us opportunity to see how to use S-Plus to perform our fitting and diagnostic procedures.
When formulating a suitable model for a set of data, we should always take into account:
1. Background scientific theory which may suggest a specific structure for our model;
2. Scatterplots of the data; and,
3. Statistical model output and diagnostic procedures.
II. The Model and Assumptions
If our dataset consists of a sample of n pairs (x1 , Y1 ), . . . , (xn , Yn ), where the Yi ’s are considered to be the values of a “response” or “dependent” variable (i.e., the variable whose characteristics we are most interested in examining and explaining) and the xi ’s are the values of a “predictor” or “independent” variable (i.e., a variable whose value may potentially influence the value of the response or dependent variable), then the simplest possible regression structure has the linear form:
Y = β0 + β1 x + , where is a mean-zero random variable having variance σ 2 . Specifically, this means that we believe that each data value Yi can be expressed as:
Yi = β0 + β1 xi + i ,
(i = 1, . . . , n)
where the i ’s are the “errors” or “noise” in the model; that is, they are the stochastic or random component of Yi , and they measure the amount by which the observed value differs from what the “deterministic” part of the model would have predicted for the value of Yi , namely E(Yi |xi ) = β0 +β1 xi . We use the