In a simple regression model, we are trying to determine if a variable Y is linearly dependent on variable X. That is, whenever X changes, Y also changes linearly. A linear relationship is a straight line relationship. In the form of an equation, this relationship can be expressed as
Y = α + βX + e
In this equation, Y is the dependent variable, and X is the independent variable. α is the intercept of the regression line, and β is the slope of the regression line. e is the random disturbance term.
The way to interpret the above equation is as follows:
Y = α + βX (ignoring the disturbance term “e”)
gives the average relationship between the values of Y and X.
For example, let Y be the cost of goods sold and X be the sales. If α = 2 and β = 0.75, and if the sales are 100, i.e., X = 100, the cost of goods sold would be, on average,
2 + 0.75(100) = 77. However, in any particular year when sales X = 100, the actual cost of goods sold can deviate randomly around 77. This deviation from the average is called the “disturbance” or the “error” and is represented by “e”.
Also, in the equation
Y = 2 + 0.75X + e
i.e.,
Cost of goods sold = 2 + 0.75 (sales) + e
the interpretation is that the cost of goods sold increase by 0.75 times the increase in sales. For example, if the sales increase by 20, the cost of goods sold increase, on average, by 0.75 (20) = 15. In general, we are much more interested in the value of the slope of the regression line, β, than in the value of the intercept, α.
Now, suppose we are trying to determine if there is a relationship between two variables which have apparently no relationship, say the sales of a firm, and the average height of employees of the firm. We would set up an equation like the following:
Y = α + βX + e
where
Y = sales of firm, X = average height of employees, α = intercept of the regression line, β = slope of the