Suppose we are given a set of data points {(xi , fi )}, i = 1, . . . , n. These could be measurements from an experiment or obtained simply by evaluating a function at some points. You have seen that we can interpolate these points, i.e., either find a polynomial of degree ≤ (n − 1) which passes through all n points or we can use a continuous piecewise interpolant of the data which is usually a better approach. How, it might be the case that we know that these data points should lie on, for example, a line or a parabola, but due to experimental error they do not. So what we would like to do is find a line (or some other higher degree polynomial) which best represents the data. Of course, we need to make precise what we mean by a “best fit” of the data. As a concrete example suppose we have n points (x1 , f1 ), (x2 , f2 ), ··· (xn , fn )
and we expect them to lie on a straight line but due to experimental error, they don’t. We would like to draw a line and have the line be the best representation of the points. If n = 2 then the line will pass through both points and so the error is zero at each point. However, if we have more than two data points, then we can’t find a line that passes through the three points (unless they happen to be collinear) so we have to find a line which is a good approximation in some sense. Of course we need to define what we mean by a good representation. An obvious approach would be to create an error vector of length n and each component measures the difference (fi − y(xi )) where y = a1 x + a0 is the line we fit the data with. Then we can take a norm of this error vector and our goal would be to find the line which minimizes this error vector. Of course this problem is not clearly defined because we have not specified what norm to use. The linear least squares problem finds the line which minimizes this difference in the ℓ2 (Euclidean) norm. Example We want to fit a line p1 (x) = a0 + a1 x to the data points (1, 2.2), (.8,