Updating and Sequential Least Squares

Developing Models for Kalman Filters

In previous installments, we covered the Linear Least Squares method for determining parameters of a linear model, starting with a simple input/output model, and working towards a dynamic state transition model. For the dynamic model, the model predicts output values based on a given set of current input values and the current internal state variables. The predictions are calculated as a sum of the input values times constant coefficient values.

Those coefficients are the model parameters. Getting good estimates of multiple parameter values from noisy data can require collecting thousands (tens of thousands, hundreds of thousands) of input/output data combinations. That means an independent data matrix M with thousands of rows. Sometimes it is no big deal: bulky or not, you just grind through the data once and you are done. But if you have to do this crunching a lot, it means reserving quite a lot of system resources. First, you need plenty of memory just to retain all the data. Then, at the finish, there is a long delay for all of the CPU cycles needed to perform the calculations. Is there some way to utilize the data "on the fly" as collected, rather than accumulating everything and then concentrating all the processing at the end?

 

Introducing rank-one updates

Suppose that n items of data were accumulated to build the input data matrix Mn of independent ("input") values. The linear model uses the "multipliers" in the parameter vector pn to "predict" the corresponding output values Yn. If the model were perfect, and if the data were perfect, the following conditions would all be satisfied exactly.

M n p n = Y n

However, in the real world these conditions can't all be satisfied perfectly. As we saw in preceding installments, we can solve the Normal Equations to obtain an approximate solution that is "best" according to the least sum of squared errors criterion.

M n T M n · p n = M n T · Y n

In most non-pathological cases, this "necessary condition" yields a unique solution that is the best-fit solution we seek.

Now suppose that this is only a partial solution; the solution we actually want includes all of the previous constraint data, plus one more input/output relationship. We can modify the original problem by including the data terms for the new constraint as the new top row of the data matrix. All of the rows previously present are still there, covered by the Mn matrix terms. Similar for the corresponding output values, a new output term yn+1 is added to the top of the output data vector, with the previous output terms Yn still present below.

M N + 1 [ m n + 1 M n ] Y n + 1 [ y n + 1 Y n ] p n + 1 p n + Δ p

The new Normal Equations formulation to solve for this adjusted best fit problem is:

M n + 1 T M n + 1 · p n + 1 = M n + 1 T · Y n + 1

In this solution, we can expect the new information to result in a new solution pn+1 differing from the previous best estimate by the small amount Δp.

If we substitute the notations from above, we can see the roles that previous data and new data play in producing the updated solution.

m n + 1 T m n + 1 + M n T M n · p n + Δ p = m n + 1 T · y n + 1 + M n T · Y n

This provides a scheme to construct the Normal Equations incrementally.

  1. Let k be the number of parameters to be evaluated. Start by constructing the k x k matrix M0TM0 and a k-length vector M0TY0 with all matrix and vector terms set to 0.
  2. Begin adding constraint rows. For the first input/output constraint condition to be added to the system, construct from the row of independent data values the peculiar matrix product m1Tm1 that forms a square matrix of size k x k but rank 1. This matrix is summed into the previously constructed matrix storage, M0, to form the updated matrix M1. On the other side of the equation system under construction, the new data vector is transposed into a column vector form m1T, scaled by the new output value y1, and this is summed with the previous right-side accumulation Y0 to form the updated vector Y1.
  3. With the next row of independent values and corresponding output value, the M1 matrix is updated to form M2, and the Y1 vector is updated to form Y2.
  4. Repeat the same way until the entire input output data set is processed.
  5. The value of the modified p parameter vector? No need to worry about this until the time comes to compute it -- which can be after any update.

Once all of the updates are done, the Normal Equations are all set up and ready to solve.

 

Some observations

  • The point of this scheme is that it is no longer necessary to accumulate a huge collection of input-output data. All you need to carry forward from one step to the next is one k x k matrix and one vector of size k.

  • This processing is neither more nor less efficient than the calculations you would perform to set up the Normal Equations solution in the usual manner. All this does is spread the work more evenly over time.

  • The rank-one updates (and similar lower-rank updates) are not iterative approximations. You will get exactly the same result from building up normal equations in this sequential manner or from one large monolithic matrix construction.

  • You probably should not discard all of the original input/output data after updating the Normal Equations. You will need some of that data to verify your fitting results.

Some extensions

  • Instead of a single new constraint with single output value added at each update, the updating process is equally valid when groups of updates are accumulated and incorporated together as one batch mode update.

  • The method is easily adapted for the case of more than one output variables. Go ahead and try this!

For next time

When constructing the Normal Equations one constraint row at a time, the single most time-consuming operation is the matrix inverse you must calculate to evaluate parameters. This is not a big deal if done rarely. But if you need updated parameter values frequently (in the worst case, as soon as possible after each new update), some modifications to be discussed later can streamline the calculations.

 

 

Contents of the "Developing Models for Kalman Filters" section of the ridgerat-tech.us website, including this page, are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise specifically noted. For complete information about the terms of this license, see http://creativecommons.org/licenses/by-sa/4.0/. The license allows usage and derivative works for individual or commercial purposes under certain restrictions. For permissions beyond the scope of this license, see the contact information page for this site.