Population regression line: E(y|x)=1+2x,
Observation = systematic component + random error: yi = 1 +2 x + ui
Sample regression line estimated using OLS estimators:
= b1 + b2 x
Observation = estimated relationship + residual: yi =+ ei => yi = b1 + b2 x + ei
Assumptions underlying model:
1. Linear Model ui = yi - 1- 2xi
2. Error terms have mean = 0
E(ui|x)=0 => E(y|x) = 1 + 2xi
3. Error terms have constant variance (independent of x)
Var(ui|x) = 2=Var(yi|x) (homoscedastic errors)
4. Cov(ui, uj )= Cov(yi, yj )= 0. (no autocorrelation)
5. X is not a constant and is fixed in repeated samples.
Additional assumption:
6. ui~N(0, 2) => yi~N(1- 2xi, 2)
1 , 2 are population parameters,
Estimators of 1 , 2
=
are OLS estimators of the population parameters.
Actual estimates of b1 b2 depend on the random sample of data vary between samples => b1 b2 are random variables
they follow a distribution, i.e. have a mean (expected value) and variance
Find expected values, variances of b1 b2 and covariance between them i.e find sampling distribution
How do b1 b2 compare with other estimators of 1 , 2
Sampling Distribution:
Variance of estimator b2: standard error of b2: se(b2) = var(b2)
Var(b2) varies positively with and negatively with
Variance of estimator b1: standard error of b1: se(b1) = var(b1)
Var(b1) increases in and and decreases in N and where 2 is the unknown population variance of ui
Covariance:
Estimators b1, b2 are a function of yi, (Sample data) level of estimators are correlated because both depend on yi i.e. are functions of the same sample
=> covariance is negative if > 0 i.e. if slope coefficient is underestimated then the intercept is over estimated
Probability Distribution of estimators
Estimators b1, b2 are normally distributed because of assumption 6 i.e. ui~N(0, 2) => yi~N(1 + 2xi, 2) b1, b2 : linear