Multiple Linear Regression and Fourier Series 

Suppose we have a large number of data points giving the value of some dependent variable v as a function of independent variables x and y, and we wish to perform a leastsquares regression fit of the data to a function of the form 

_{} 

This is called multiple linear regression, and can be applied to give the leastsquares fit to any linear combination of functions of any number of variables. Our objective is to find the coefficients A, B, C, D, and E such that the sum of the squares of the errors between our model and the actual data points is a minimum. 

Consider the ith data point (x_{i},y_{i},v_{i}), where v_{i} is our dependent variable. For any given choice of coefficients A,B,..,E the square of the error e_{i} for this data point is 

_{} 

The squared error for each of the N data points is of this form, so we can add them all together to give the total sumofsquares of all the errors. This yields an expression identical to the one above, except that x_{i} is replaced by the sum of all the x_{i} for i = 1 to N, and similarly y_{i}x_{i}^{2} is replaced by the sum of y_{i}x_{i}^{2} for i = 1 to N, and so on. Therefore, the sum of squares of the errors can be written as 

_{} 

where the symbol S denotes summation of the indicated variables for all N data points. 

Now, to minimize the sum of squares of the errors, we can form the partial derivatives of the total sum of squares with respect to each coefficient A, B, ..., E in turn, and set each of these partial derivatives to zero to give the minimum sum of squares. Notice that, when evaluating the partial derivative with respect to A, only the terms in which A appears contribute to the result. Similarly the partial derivative with respect to any given coefficient involves only the terms in which that coefficient appears. Each of the partial derivatives turns out to be linear in the coefficients. For our example, setting each of the partial derivatives of the sum of squared errors to zero gives the following set of linear simultaneous equations 

_{} 

Dividing all terms by 2, noting that S1 = N, and putting these equations into matrix form, we have the 5x5 system of equations 

_{} 

We can solve this system by any convenient method (e.g., multiply the righthand column vector by the inverse of the 5x5 matrix) to give the best fit coefficients A, B, C, D, and E. Thus the process of finding the optimum coefficients by multiple linear regression consists simply of computing all the summations S for the N data points, and then performing one matrix inversion and one matrix multiplication. 

Obviously this same method can be applied to find the constant coefficients c_{0}, c_{1}, .., c_{n} that minimize the sum of squares of the error of any given set of data points v(x_{1}, x_{2}, …, x_{m}) in accord with the model 

_{} 

where f_{0}, f_{1},.., f_{n} are arbitrary functions of the independent variables x_{1}, x_{2}, …, x_{m}. Proceeding exactly as before, we square the difference between the modeled value and the actual value of v for each data point, then sum these squared errors over all N data points, then form the partial derivatives of the resulting expression with respect to each of the n+1 coefficients, and then set these partial derivatives to zero to give a set of n+1 linear simultaneous equations which can be written in matrix form as 

_{} 

where Sf_{i}f_{j} signifies the sum of the products of f_{i}(x_{1},..,x_{m}) and f_{j}(x_{1},..,x_{m}) over all N data points. Multiplying the righthand column vector by the inverse of the square matrix gives the least squares fit for the coefficients c_{0}, c_{1}, …, c_{n}. So, in general, if we model the dependent variable v as a linear combination of n arbitrary functions of an arbitrary number of independent variables, we can find the combination that minimizes the sum of squares of the errors on a set of N (no less than n+1) data points by evaluating the inverse of a (n+1)x(n+1) matrix. 

Incidentally, the last equation shows that it is particularly easy to perform the regression if we choose basis functions f_{j} that are mutually orthogonal, meaning that we have 

_{} 

where “x_{i}” here signifies all the independent variables for the ith data point. In this case the coefficient matrix in the system equation is diagonal, so the coefficients of our curve fit are simply 1/l times the right hand column vector, which is to say, 

_{} 

To illustrate this, suppose we have a sequence of equally spaced samples from a onedimensional function v(x) over the range from x = p to +p. With a large enough set of equallyspaced samples, the summation in (1) approaches an integration over the x variable, so our basis functions will be orthogonal if 

_{} 

One such set of functions is the elementary trigonometric functions cos(kx) and sin(kx), so we can choose the following basis functions 

_{} 

for j = 1 to infinity. It’s easy to verify that these functions satisfy equation (3) with l = p. Therefore, to represent the function v(x) as a linear combination 

_{} 

the coefficients given by our linear regression formula (2), with the summation changed to an integration, are 

_{} 
and 
_{} 

for j = 1, 2, … We recognize these as the coefficients of the wellknown expressions for the Fourier series for v(x). Thus we’ve shown that these expressions emerge quite naturally from the formulas for simple linear regression, and it immediately generalizes to higherdimensional functions with arbitrary basis functions and arbitrarily distributed samples with errors. 
